antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.11k stars 3.69k forks source link

Probabely a bug in MysqlParser.g4 where condition #1525

Open yafngzh opened 4 years ago

yafngzh commented 4 years ago

image

The MysqlParser.g4 is here, and the rule is expression. As shown in the image, when negative op and minus op are together, they are mistaken as minusminus.

ghost commented 4 years ago

https://github.com/antlr/grammars-v4/blob/306eb4776ee3347a81caa4a6933033f5ad5e660a/mysql/Positive-Technologies/MySqlParser.g4#L2335-L2337

Is '--' really a mathOperator?

mike-lischke commented 4 years ago

This grammar has a number of flaws (besides being incomplete). Use the Oracle grammar instead for correct parsing. And the bug you see is not in the where clause, but in expressions. -- is not a valid operator in MySQL. Instead it is used for single line comments (when followed by a whitespace).

A few more details about this grammar can be found in the MySQL folder in this grammar repositiory.

yafngzh commented 4 years ago

Thanks! @mike-lischke . I already used Oracle grammer and it's OK now. The MysqlParser.g4 is just a little misleading, is it necessary to add the above explaination in README.md ?

KvanTTT commented 4 years ago

I renamed tag mysql to mysql-PositiveTechnologies and introduced mysql-Oracle tag to avoid confusion.

Also, I have an idea to port mysql grammar by @mike-lischke to other targets (at least Java, C#) using a universal grammar approach that already used in JavaScript, Python, PHP.

mike-lischke commented 4 years ago

@KvanTTT Your "universal grammar approach" isn't that universal actually, as you are relying on syntax that can be understood by any runtime (e.g. simple variable access, simple function calls). I mentioned this idea a while ago in the mailing list and the MySQL grammar only uses predicates which follow this idea. No need to introduce any special accessor methods, like your p("get"). It's not necessary that all grammars use the same methods in their actions (and usually also not possible due to different requirements). If all grammar authors take care to use a simple syntax then we can always easily support all languages that support them. If needed, we can also extend the simple macro processing ANTLR does (e.g for $text), which is better than introducing a new nomenclature.

KvanTTT commented 4 years ago

Special accessors like this.p("get") that declared in base classes are more universal (at least works on Java and C#) compare to using the following special-target actions: _input.LT(-1).getText().equals(str); for Java and _input.Lt(-1).Text.Equals(str); for C# that are not portable at all. But yes, anyway there are problems with some syntax in some runtimes (for C++ you have to use this-> syntax instead of this.). But I don't insist on using them everywhere. I just want to get rid of grammar code duplication if semantic actions and predicates are used.

If needed, we can also extend the simple macro processing ANTLR does (e.g for $text), which is better than introducing a new nomenclature.

Unfortunately, default simple macro does not help a lot. What do you mean by extending? Introduce external grammars preprocessing?

mike-lischke commented 4 years ago

Right, I'm also thinking since a while already about how to avoid target language specific code in actions, but I have not found a good approach so far.

If needed, we can also extend the simple macro processing ANTLR does (e.g for $text), which is better than introducing a new nomenclature.

Unfortunately, default simple macro does not help a lot. What do you mean by extending? Introduce external grammars preprocessing?

What I mean is to introduce more elements like $text that provide a language agnostic way to access values in the current context. ANTLR converts them to target language code via the ST4 templates that exist for each runtime language.

KvanTTT commented 4 years ago

What I mean is to introduce more elements like $text that provide a language agnostic way to access values in the current context. ANTLR converts them to target language code via the ST4 templates that exist for each runtime language.

I think it's a good idea and I have similar thoughts, see https://github.com/antlr/antlr4/issues/1045 and https://github.com/KvanTTT/Articles/blob/master/Ideal-parser-generator/English/Abstract.md

gmai2006 commented 3 years ago

Added fix to this # 1525 by removing the MINUSMINUS from the lexer. Will open an issue for support mix cases and add a proposed grammar file. https://github.com/gmai2006/grammar/