[postgresql] Why is sub-parser created for function bodies?

antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.

MIT License

10.24k stars 3.72k forks source link

[postgresql] Why is sub-parser created for function bodies? #4304

Closed kaby76 closed 1 week ago

kaby76 commented 2 weeks ago

The PostgreSQL grammar defines error listeners LexerDispatchingErrorListener.cs, ParserDispatchingErrorListener.cs, LexerDispatchingErrorListener.java, and ParserDispatchingErrorListener.java. But these error listeners have absolutely nothing to do with the grammar, don't reference any symbols from the grammar, and should therefore not be in the grammar.

kaby76 commented 2 weeks ago

The code is used in a special sub-parser. Not entirely clear why it's a sub-parser using the same PostgreSQL grammar. I will need to investigate this further. For now, I won't remove the code until I understand why it's being done this way rather than directly.

kaby76 commented 2 weeks ago

The sub-parser was added with this commit, https://github.com/antlr/grammars-v4/commit/0c85145aa0b5518688a9d226354d40183b954ff4#diff-191019bcbd2ccd918ba1d7e6610ca830bc526d04af2ae6e6d3c5356e96274b76, and merged as part of https://github.com/antlr/grammars-v4/pull/2395, but with zero explanation why a sub-parser is created, presumably to mutate the parse tree.

kaby76 commented 2 weeks ago

Consider the input in aggregates.sql: https://github.com/antlr/grammars-v4/blob/199a5121ece05d2f2e7eca330d0738220499e80c/sql/postgresql/examples/aggregates.sql#L778-L798

This input contains text that is a function body. However, the body syntax depends on a literal after the input text: https://github.com/antlr/grammars-v4/blob/199a5121ece05d2f2e7eca330d0738220499e80c/sql/postgresql/examples/aggregates.sql#L798

Is a sub-parser created because the parser has a difficult time synching back up once an error is detected?

kaby76 commented 1 week ago

The sub-parse is correct but it is implemented incorrectly. The sub-parser is for pl/pgsql, which is a distinctly different language. Unfortunately the grammar for pl/pgsql was merged into that for postgresql. I've removed the rules for pl/pgsql. The sub-parser will be implemented in a separate Antlr grammar.