antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.24k stars 3.72k forks source link

[postgresql] Grammar contains target-specific declarations via locals. #4311

Closed kaby76 closed 1 week ago

kaby76 commented 2 weeks ago

Consider this input.

create function explain_query_json(query_sql text)
returns table (explain_line json)
language plpgsql as
$$
begin
  set enable_seqscan = off;
  set enable_bitmapscan = on;
  return query execute 'EXPLAIN (ANALYZE, FORMAT json) ' || query_sql;
end;
$$;

I was trying to print out the parse tree for this using toStringTree(). The output isn't correct because the function body is still just a string.

Instead, it creates a field for the node type func_as using a "locals declaration". This is not the right way to do this because it makes the .g4 target-specific rather than target-agnostic. The proper way is to use the contextSuperClass option. The alternative is to redo the tree after parsing the normal PostgreSQL input. This is probably why the Go port doesn't work because the order of the "type field-name" is reversed for Go. This can be fixed via the transformGrammar.py hack, but it doesn't solve the toStingTree/serialization problem.

Per-language specific "toStringTree()" code is not attempted. Also, when the parse tree is serialized, Trash doesn't know anything about this. ToStringTree() is not the only thing to fix.

I also see that this code always parses the function body as always PlSQL, rather than case on the language first (Java, CSharp, Go).