Use CrateDB's ANTLR grammar in SQL parser

amotl commented 4 months ago

I think we should look into replacing sqlparse with a parser generated from the CrateDB antlr grammar, to ensure it supports our SQL dialect.

See https://github.com/antlr/antlr4/blob/4.6/doc/python-target.md

Originally posted by @mfussenegger in https://github.com/crate/crash/pull/432#pullrequestreview-1942455036

surister commented 4 months ago

I think we should move this away from crash and create a separate repo to have the development of antlr4 -> targets.

I see two uses cases for the antlr4 targets:

Python: For crash and cloud team
Javascript: Local admin UI without any backend.

I have already something half-working:


input = InputStream('SELECT * FROM sys.shards; SELECT 2;')
lexer = SqlBaseLexer(input)
stream = CommonTokenStream(lexer)
parser = SqlBaseParser(stream)
parse_tree = parser.statements()
print_stmt(parse_tree.statement(0), stream)
print_stmt(parse_tree.statement(1), stream)

SELECT * FROM sys.shards
SELECT 2

mfussenegger commented 4 months ago

I see two uses cases for the antlr4 targets: Python: For crash and cloud team Javascript: Local admin UI without any backend.

I don't see a benefit in mixing several targets into a single repo. You just end up with a ton of different build systems. A local admin UI without any backend also seems a bit like a weird use-case. The whole point of the admin UI is to inspect a cluster and run queries.

As long as there is only a single consumer it can be inlined, otherwise you're just adding resource maintenance overhead. What would cloud use it for? Could start integrating it here, make sure it's in a dedicated module - then extracting it later on should be easy enough.

amotl commented 4 months ago

What would "cloud" use it for?

Not being involved in all the relevant details, nor their exact implementations, nor too much on the grammar plumbing side as well, I guess @surister might be referring to the SQL Query Console where such a component / outcome may also be used or relevant in one way or another.

https://github.com/crate/cloud-docs/pull/66

Could start integrating it here, make sure it's in a dedicated module - then extracting it later on should be easy enough.

Starting is important, right, I absolutely agree.

Maybe I can't still grok this fragment of the discussion, so please support me: Is it just about parallelizing this endeavor to serve the needs of both Python and JavaScript, and finding a right place for this work?

surister commented 4 months ago

I see two uses cases for the antlr4 targets: Python: For crash and cloud team Javascript: Local admin UI without any backend.

I don't see a benefit in mixing several targets into a single repo. You just end up with a ton of different build systems.

I tried it and the way targets are created are extremely similar, different targets might potentially share scripts, hence the proposition.

A local admin UI without any backend also seems a bit like a weird use-case. The whole point of the admin UI is to inspect a cluster and run queries.

Sorry, when I meant without any backend I meant without Grand Central, our current admin ui embedded with Crate, which of course, CrateDB is the backend in this case :P, the use case I see would be to actually provide correct syntax highlighting and error highlighting which is currently is very flaky and hand-made. Furthermore there is an antlr4 target for ACE which is the editor that we use in cloud.

What would cloud use it for?

Besides the aforementioned use case, cloud already uses sqlparse when proxing queries, enabling running several queries.

surister commented 4 months ago

cc @proddata

crate / crash

Use CrateDB's ANTLR grammar in SQL parser #433