Closed tclxdean-lu closed 5 months ago
This grammar is unfortunately garbage. I am an advocate of removing it as it comes up often. After spending sometime with the Go runtime to allow such poor grammars to function, I decided it was not worth it. Someone told me that there was a good grammar somewhere, maybe in the MySql source code.
Don't waste your time pursuing this.
This grammar is unfortunately garbage. I am an advocate of removing it as it comes up often.
If even it's garbage, it's ANTLR pitfail that it doesn't report grammar problems. ANTLR 4 was created with mind to allow user write simple grammars.
ANTLR4 wasn't designed to process grammars written aas a straight copy from the normative spec though. Yeah, it will parse given anything, but that does not mean we should promote it that way in my opinion.
We shoudl at least comment, README or otherwise annotate that this grammar has many issues. On Mar 5 2024, at 6:19 am, Ivan Kochurkin @.***> wrote:
This grammar is unfortunately garbage. I am an advocate of removing it as it comes up often. If even it's garbage, it's ANTLR pitfail that it doesn't report grammar problems. ANTLR 4 was created with mind to allow user write simple grammars.
— Reply to this email directly, view it on GitHub @./0?redirect=https%3A%2F%2Fgithub.com%2Fantlr%2Fantlr4%2Fissues%2F4545%23issuecomment-1978642274&recipient=cmVwbHkrQUFKN1RNRjZKMzRDQ1lGVTczTFFMV1dENkxYTTdFVkJOSEhJQ0g0TktRQHJlcGx5LmdpdGh1Yi5jb20%3D), or unsubscribe @./1?redirect=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAJ7TMFHWKQ27YOILRFJPA3YWWZ47AVCNFSM6AAAAABEE2GG5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYGY2DEMRXGQ&recipient=cmVwbHkrQUFKN1RNRjZKMzRDQ1lGVTczTFFMV1dENkxYTTdFVkJOSEhJQ0g0TktRQHJlcGx5LmdpdGh1Yi5jb20%3D). You are receiving this because you commented.
I don't support the idea with removing unless we have another better MySql grammar. Nobody forces anybody to fix this grammar issues.
Also, I think there are many poorly-written grammars in this repository. There are many issues with MySql grammar probably because it's more demandable compare to others.
As part of MySQL Workbench, @mike-lischke has/had a v3 grammar that he wrote to use with the C target.
https://dev.mysql.com/blog-archive/parsing-in-mysql-workbench/
Mike's stuff is very good. @mike-lischke (I think I have the right Mike? ;), Can I take that and convert it to v4?
There was also a grammar released by Salesforce at some point many moons ago.
I will spend a little time on this and see where we get to.
Actually, it looks like MySqlWorkbench now uses an official Oracle v4 grammar. I am not sure where that comes from originally, but we have this:
https://github.com/mysql/mysql-workbench/tree/8.0/library/parsers/grammars
Yes, it's the right Mike, Jim :-D
And yes, I wrote the original MySQL ANTLR grammar (for use in MySQL Workbench), at that time for ANTLR3 and the C target (created by Jim!). When you go back far enough in the git history, you might be able to find it there, still.
The current MySQL grammar in WB is outdated (4 years old) and supports only MySQL 5.6, 5.7 and 8.0. The latest and greatest version of it is now in the MySQL Shell GUI for VS Code project (part of the shell plugins). This one, however, only supports MySQL 8.0+ and requires TypeScript as target.
Since a while I only have had a readme for my MySQL Grammar in the grammar-v4 repository, describing where to get it (it needs a bit support code), but I plan to make this a full grammar entry as soon as I can find the time for that.
Yes, it's the right Mike, Jim :-D
And yes, I wrote the original MySQL ANTLR grammar (for use in MySQL Workbench), at that time for ANTLR3 and the C target (created by Jim!). When you go back far enough in the git history, you might be able to find it there, still.
The current MySQL grammar in WB is outdated (4 years old) and supports only MySQL 5.6, 5.7 and 8.0. The latest and greatest version of it is now in the MySQL Shell GUI for VS Code project (part of the shell plugins). This one, however, only supports MySQL 8.0+ and requires TypeScript as target.
Since a while I only have had a readme for my MySQL Grammar in the grammar-v4 repository, describing where to get it (it needs a bit support code), but I plan to make this a full grammar entry as soon as I can find the time for that.
OK - cool. I was just taking a look now. I will take a look through your Typescript targeted one as maybe it can be made more generic. No problem letting you do it when you have time. Maybe for now, we can change the README for the other one and refer to these other repos as perhaps being a better place to start.
If it seems that making your grammar generic is a reasonable task, then I may just do it. I am going to put some ANTLR time in these next two weeks, such as upgrading the Maven plugin (nearly finished with that). I could do with some entries in my GitHub account as there is a dearth of any work out there right now and I am twiddling my thumbs a bit.
I highly welcome any help with that! You might be interested in my antlr4wasm repo where I have a set of benchmarks with my MySQL grammar (the one there is even newer than the one in the MSG project, as I have this updated in February for the upcoming release). And in this benchmark folder there are two versions, one for TS and one for C++, including the stripped down support code. I wanted to add a Java version too, but failed so far. That should be a good starting point.
OK - cool. I started on a Go version (but literally just created the project and downloaded your grammars. I was just about to ask if there was any need to support both MRS and non-MRS as separate units. It seemed like I could just use the MRS versions.
As every time I ask, you tell me about a new version, are there any other repos ;) - ha.
You can ignore MRS (short for MySQL REST Service) completely. That's a non-released language extension to MySQL and does not include standard MySQL. In fact in the build script for MSG I merge both grammars, as I didn't want to have something in the main grammar that doesn't really exist.
I'm really looking forward to get even more targets running the same benchmarks, so we can compare them. I'd love to see Java there too.
I’ll do Java if I get a chance.
On Tue, Mar 5, 2024 at 14:02 Mike Lischke @.***> wrote:
You can ignore MRS (short for MySQL REST Service) completely. That's a non-released language extension to MySQL and does not include standard MySQL. In fact in the build script for MSG I merge both grammars, as I didn't want to have something in the main grammar that doesn't really exist.
I'm really looking forward to get even more targets running the same benchmarks, so we can compare them. I'd love to see Java there too.
— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4545#issuecomment-1979534816, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMEWKXU7BLAQU43EM5DYWYQD3AVCNFSM6AAAAABEE2GG5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGUZTIOBRGY . You are receiving this because you commented.Message ID: @.***>
Test with the same mysql grammar, the java sdk is mush more faster than golang version, maybe there is some implementation difference in golang sdk?
Test with the same mysql grammar, the java sdk is mush more faster than golang version, maybe there is some implementation difference in golang sdk?
You are using this grammar, and not the one in the contributed repo?
Test with the same mysql grammar, the java sdk is mush more faster than golang version, maybe there is some implementation difference in golang sdk?
You are using this grammar, and not the one in the contributed repo?
Yes. Using https://github.com/antlr/grammars-v4/blob/master/sql/mysql/Positive-Technologies/MySqlParser.g4
Java SDK is much faster than golang SDK,You can try it with Intelijj IDEA ANTLR4 Plugin.
Java SDK is much faster than golang SDK
I'm curious how much faster. Could you perform a benchmark?
Test with the same mysql grammar, the java sdk is mush more faster than golang version, maybe there is some implementation difference in golang sdk?
You are using this grammar, and not the one in the contributed repo?
Yes. Using https://github.com/antlr/grammars-v4/blob/master/sql/mysql/Positive-Technologies/MySqlParser.g4
Java SDK is much faster than golang SDK,You can try it with Intelijj IDEA ANTLR4 Plugin.
That is the poor grammar in the free downloads and not the one mentioned here.
I made some inroads on improving the performance of such broken grammars, but decided enough was enough. It "works" but will be too slow to be practical to use in any target. That demo grammar is not fit for purpose I am afraid and is not the one that this thread changed to discuss. I can see your confusion, as the thread started out with the postive-technologies one.
Basically, do not use that grammar.
Java SDK is much faster than golang SDK
I'm curious how much faster. Could you perform a benchmark?
It's not relevant really. I improved from essentially infinite execution time to impractical. It depends heavily on the input given to the parser. The grammar is just broken in the sense that it is so poorly formed that it pretty much traverses every possible grammar rule for everything. It is a testament to the algorithm that it finished at all to be honest.
It's not relevant really.
Maybe relevant because it shows that Go runtime works differently. It probably can cause similar problems even with "good" grammars.
NB, the debezium project with almost 10K stars on GitHub relies on this grammar. I'm curious if this grammar causes big performance problems, probably @Naros could answer.
I'm curious if this grammar causes big performance problems, probably @Naros could answer.
So we use 3 different grammars from this upstream project: MySQL, MariaDB, and Oracle.
Our use of the MySQL and MariaDB grammar is restricted solely to DDL changes and not DML. So we probably don't observe any measurable performance issues since we process far more DML events than DDL.
When we implemented the Oracle connector in 2020, we considered using the ANTLR grammar for parsing both the DDL and DML events; however, we did notice a performance concern, but that concern was not only about the parse speed, but the number of temporary objects constructed. Since we generally work with higher volumes of inserts, updates, and deletes, we needed a robust solution and since our ingestion for insert, update, and deletes via LogMiner have a very restricted set of grammar bits, we simply rolled our own for DML and simply use ANTLR for DDL only.
The grammar is just broken in the sense that it is so poorly formed that it pretty much traverses every possible grammar rule for everything.
It certainly needs work @jimidle, there is no denying that; it's just a time/resource issue. I've wanted to sit down with the grammar we heavily use and start with a fresh slate, but that's just not a reality any time soon, but at the same time, we aren't in a position to use anything else either :(
I think if anything, if someone wanted to collaborate on an improved version of the grammar, I'd be open to that idea even if it takes a bit of time to do it, and do it right.
@parrt we can close this now. The issue is with the grammar, not ANTLR not the runtimes.
Also, I have published:
https://github.com/jimidle/tsqlparser
Which @parrt analysed in a paper when it was in v3 form a long time ago. It has very little ambiguity and parses correctly. It is not MySQl though. Use Mike's grammar for that.
Using antlr4 go to parse SQL takes too much time, eg:
I am using https://github.com/antlr/grammars-v4/blob/master/sql/mysql/Positive-Technologies/MySqlParser.g4
output:
Execution time: 50ms