Open puellavulnerata opened 7 years ago
I believe, tentatively, that the underlying problem here is that collect_nts()
relies on the desugared
field of the parser nodes, so each HCFChoice
there is freed along with the corresponding parser, but in the case where desugared
is a terminal we wrap it, and the resulting wrapper HCFChoice
does not appear anywhere in the parser, only in the grammar, and so never gets freed.
Andrea Shepard on Tue, Dec 06 2016:
While valgrinding unit tests and plugging leaks, I found that
/core/grammar/end
leaks even if I addh_cfgrammar_free(g)
totest_end()
. The cause is the calls toh_new()
(rather than in the grammar's arena) inif (h_hashset_empty(g->nts))
branch ofh_cfgrammar_()
. Unfortunately, these cannot simply be changed over toh_arena_malloc()
calls because the LL(k) backend frees the grammar once it finishes compiling the parser, but apparently keeps some of these pointers around and uses them during parsing. Further investigation proceeding.
I've had a quick look and can shed some light:
The real issue is h_desugar
, which uses h_new
. The case in
h_cfgrammar_
that you mention is just a special case where h_desugar
doesn't return the form we want. These objects (HCFChoice
) are kept
because they represent the nonterminals[*]. To free this memory
properly, we should consider it a transfer of ownership from the grammar
to the table (backend data).
Longer version:
To generate the grammar, the parser is first converted to a
sum-of-products representation by h_desugar
. The type HCFChoice
with tag HCF_CHOICE is the sum and HCFSequence
is the product. The
base terms are represented by HCFChoice
with the other tags
(HCF_CHAR
, HCF_CHARSET
, etc.).
The case you're looking at in h_cfgrammar_
is the special case where
h_desugar
returns a base term and we need to wrap that in a singleton
sequence and a singleton choice to give it the form expected of the
start symbol (nonterminal).
The function h_desugar
is using h_new
in the same style the
combinators use it: allocate once, free never. The implicit assumption
being that parsers are constructed and compiled once at the beginning of
the program and deallocated by termination of the program. Don't look at
me, I just got here.
I intended the h_cfgrammar
routine to be potentially useful
stand-alone, and I don't think we want to burden a user of that function
with the detail of (the awkwardly-named) "desugaring", so I think it's
right to create these objects there. But when the HCFGrammar
is
created only transiently as part of compiling a table, it can't free the
HCFChoice
s with it, as discussed.
One simple solution would be to explicitly transfer ownership of the
symbols to the table and only free them in h_cfgrammar_free
if we
still own them. Or instead of explicitly tracking the ownership,
distinguish between a full and a "half" free of the grammar; let
h_cfgrammar_free
free everything and give the backend another function
to free only the grammar during compilation and the symbols later, in
its own destructor.
-pesco
[*]: They contain, for instance, the function pointers for semantic actions, predicates, and the "reshape" functions.
While valgrinding unit tests and plugging leaks, I found that
/core/grammar/end
leaks even if I addh_cfgrammar_free(g)
totest_end()
. The cause is the calls toh_new()
(rather than in the grammar's arena) inif (h_hashset_empty(g->nts))
branch ofh_cfgrammar_()
. Unfortunately, these cannot simply be changed over toh_arena_malloc()
calls because the LL(k) backend frees the grammar once it finishes compiling the parser, but apparently keeps some of these pointers around and uses them during parsing. Further investigation proceeding.