Lark parser's comments cache is not cleared when a syntax error occurs

kinow commented 2 years ago

This appears to cause an assertion error when loading multiple WDL workflows. The state (comments cache) is kept at the module level, so only when the module is unloaded or application stops the state is cleared, or when the parser executes successfully.

When you parse an invalid workflow that raises a SyntaxError, for instance, the cache will preserve the comments. Then, when you parse a valid workflow next, there will be an assertion error when the Document object is created.

This caused an error in one of the pull requests for the wdl-cwl-translator, as the test for an invalid file was executed before the other tests: https://github.com/common-workflow-lab/wdl-cwl-translator/pull/168

Thanks! Bruno

kinow commented 2 years ago

Seeing if I can submit a quick PR with a fix, but no worries if a maintainer prefers to take over (I guess changing the parser can be a little dangerous).

mr-c commented 2 years ago

The simple fix could be to clear the comments buffer also before line 19 https://github.com/chanzuckerberg/miniwdl/blob/d50d97a9e263373d8d46610b2fdcc99b16059e49/WDL/_parser.py#L19 , yes?

kinow commented 2 years ago

Ah, that's a good idea! I was planning on try/catching. But clearing the comments buffer just before calling the function that raises that syntax error sounds much simpler! :tophat:

kinow commented 2 years ago

(writing a quick unit test first to reproduce the issue... then this fix, PR, and :sleeping_bed: )

chanzuckerberg / miniwdl

Lark parser's comments cache is not cleared when a syntax error occurs #548