Diagnostics as pattern matching

c42f commented 1 year ago

I've been thinking about what we'd need for a diagnostics system which can really solve a couple of core problem I'm worrying about:

Accessibility: end users should be easily able to contribute new helpful and friendly diagnostics without understanding the code of the compiler frontend. Friendly comprehensible errors are most helpful to beginners, and beginners should be able to help writing these. But beginners will rarely be able to dive into JuliaSyntax.jl and make changes.

Cleanliness and separation of concerns: If possible I don't want to clutter the parser itself with large amounts of heuristic code and error/warning message formatting.

With these in mind, I want to claim that:

For a parser system where a syntax tree is always produced, compiler diagnostics (warnings, errors) are not really different from linter messages based on symbolic pattern matching

Therefore, we should be inspired linters like semgrep in using pattern matching techniques to match warnings and errors against the (partially broken) AST that the compiler produces. Ideally, errors and warnings could be expressed declaratively as a piece of malformed Julia code with placeholders which capture parts of that code and an error message template.

Discuss :-)

pfitzseb commented 1 year ago

One concern I had (not sure how real it is though) is that there's no canonical representation of invalid syntax, so pattern matching will end up being tied to a the parser internals.

c42f commented 1 year ago

there's no canonical representation of invalid syntax, so pattern matching will end up being tied to a the parser internals

My thought is that the canonical representation of invalid syntax is the text itself, as a string of broken source code. Then we "just" need the parser recovery to be predictable enough that we can map these broken syntax prototypes into a pattern automatically. (To be clear, I feel this is a very big "just". But it seems like the right approach.)

c42f commented 1 year ago

IMO, the key is to set things up so that we're building a database of broken syntax examples and the errors they should map to in "structured enough" form.

Then maintaining and adding to this database becomes the primary work of "having good syntax errors", and adding new examples should be easy.

This data driven approach is similar to building databases of linter errors such as semgrep seem to be doing with great success.

Bonus points for structuring the database so that it can be input to a more machine-learning style of pattern matching if necessary in the future.

gafter commented 10 months ago

There are an infinite number of ways to write a syntactically invalid program. The complement of a context-free language isn't necessarily context-free. Based on that, if we do have a system for reporting specific errors for specific patterns of input, the parser should have a fallback mechanism for reporting a syntax error when no pattern matches.

JuliaLang / JuliaSyntax.jl

Diagnostics as pattern matching #93