Open adamziel opened 5 days ago
@adamziel Nice! It would be great to have that, and I think it's better to use the EBNF syntax than ANTLR because the ANTLR format can misleadingly imply the usage of a specific toolset. Having that, we could then add a custom ENBF-based grammar toolset for analyzing, converting, and compressing the grammars.
I actually think we could switch to EBNF in https://github.com/WordPress/sqlite-database-integration/pull/157 because I see us more likely to maintain our own MySQL grammar rather than trying to synchronize if from MySQL Workbench. That said, we'd need support for parametrization (conditionally enabling or disabling some rules) to make it MySQL version aware. I don't think EBNF supports that out of the box, but it seems this could help us: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form#Extensibility
It would be nice to have both ANTLR and EBNF support! I only started exploring this to see how easily we could parse Markdown. I ended up using a parser library for now but I'd like to revisit this eventually and either .g4 or .ebnf seem fine.
@adamziel We could do that too! After all, the basic syntax is very similar. In any case, having a custom lean grammar toolset would be great, as the existing tools we've found so far weren't great.
@adamziel One more open question here is lexing. Each new parser will either need a manually written lexer (some could be fairly simple), or we'd need to build a more generic one (maybe similar to Phlexy by @nikic).
We could support character ranges and run the parser directly on the input string
I started drafting an EBNF notation processor to potentially enable building parsers based on a grammar file, similarly to the MySQL parser (although that one is based on ANTLR grammar). I don't have any specific action for this issue, I only wanted to dump the code somewhere it would be searchable in the future. CC @janjakes