dolthub / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
22 stars 20 forks source link

Extension mechanism for the parser #123

Open Allam76 opened 2 years ago

Allam76 commented 2 years ago

First of all thank you for an excellent solution!

Feature Description

Have you thought of an extension mechanism for the parser? As far as I know, yacc does not allow extending the grammar. I have a custom storage engine for go-mysql-server that require some small changes to the tokenizer and parser and it seems my only option is to fork your version of vitess?

Use Case(s)

Custom storage engine that does not follow ANSI or mysql SQL syntax.

zachmu commented 2 years ago

You are correct that extensions with yacc grammars are more or less impossible. Your best bet would be to fork the project to make the customizations you want.

What we could do on the go-mysql-server side is make the parser pluggable the way other parts of the engine are. It's kind of a lot of work to transform the vitess AST into the go-mysql-server query plan tree though, and it would be a pretty fragile extension point.

Do you have some examples of what you're talking about that we could examine?

Allam76 commented 2 years ago

First of all, sorry for my late reply.

One example is dealing with postgres schemas. When connecting to postgres from an external query engine, one needs a path syntax like: <db>.<schema>.<table> as table identifier. By parsing this as a path instead of a <id>.<id> one can future proof the identifiers. This is the approach taken by calcite and presto.

This comes up a lot when adapting existing DB SQL dialects and there seems to be three ways:

1) Fork the parser for each use case and then many parallel implementations. 2) Switch from yacc to some other grammar solution that supports extension. Calcite has that for example. 3) Accept PRs with all sorts of changes to the parser that are then filtered out and discarded during analysis of the AST if not needed.

2 is usually considered to be the best solution but that would require you to switch parser generator. Not a very fun prospect.

ANTLR can be extended and has a GO target. Not sure how PEG for GO works.

Allam76 commented 2 years ago

I did a quick analysis. It would not be too hard for me to translate the parser to ANTLR and keep the compatibility with the query plan tree. That would allow extension, separation of concern and a more "modern" parser generator. However, it would also break the compatibility with vitess.

In calcite, someone use this extensibility to run queries directly from a graphQL parser. :smiley: