Instagram / LibCST

A concrete syntax tree parser and serializer library for Python that preserves many aspects of Python's abstract syntax tree
https://libcst.readthedocs.io/
Other
1.53k stars 187 forks source link

test_parse_error for native parser #1179

Open lisroach opened 2 months ago

lisroach commented 2 months ago

I've been taking a look at https://github.com/Instagram/LibCST/issues/1112#issue-2156871432 and struggled a bit getting the tests to reproduce the error, although it was easy to reproduce from a file.

I found this line: https://github.com/Instagram/LibCST/blame/e20e757159f435417fb4d3a0913264c7d252b847/libcst/_parser/tests/test_parse_errors.py#L177 filters out anything other than a check if the error thrown is a SyntaxError when using the native parser, so we lose granularity of testing the error string itself.

Since the errors are different between the native and pure parser, the error messages in this test class will need to be rewritten to match the native output.

A couple of questions:

  1. Is my above understanding correct? :) I might have missed if the native parser is tested elsewhere in the codebase.
  2. Do we prefer rewriting the file to test just for the output of the native parser, or should we duplicate the tests to test for both native and pure output?
zsol commented 2 months ago

👋

  1. Yes! IIRC that is the only place where parse errors for the entire parser are tested.

Before I answer 2, let me give some context: when writing the new rust-based parser I paid almost no attention to the error cases. The CPython grammar includes a bunch of helper rules described here, for example here's the one for various common assignment syntax errors. LibCST doesn't include any of these for two reasons:

  1. The parser library doesn't support immediately terminating parsing, which would mean the CPython invalid_ rules would be tricky to translate faithfully.
  2. Figuring out alternative implementations will be a game of whack-a-mole that I didn't have the energy for, especially if the alternative implementation would mean we'd have to diverge significantly from the official grammar.

So when updating the tests, keep in mind that the current parse errors are known to be low quality.

To answer your second question: yes, I'd prefer rewriting the file to test just for the output of the native parser. The pure parser's days are numbered.