antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.22k stars 3.29k forks source link

Support in-memory transpilation #3874

Open KOLANICH opened 2 years ago

KOLANICH commented 2 years ago

It'd be nice to have a capability to transpile grammars into source code in-memory, without touching storage, via API.

I have even created a PR about that (#2774), but it has been rejected because of "no big changes" (I had to do a bit of refactoring in order to support both into-file and in-memory transpilation cleanly enough).

So currently I have to maintain a separate branch (https://github.com/UniGrammar/antlr4/tree/tool_refactoring) and I'm not happy about that, because sometimes merge conflicts happen and pinpointing why and where exactly they happen and how to fix that and reapplying the changes manually without any good semantic AST-based version control system is hard and time-consuming. Also I'm not happy with shipping an own version of ANTLR that has no chance to land into distros.

Could someone in the project examine my changes and integrate them into the code base (or give me an advice on what I should do) to get them merged?

jimidle commented 2 years ago

Perhaps there are other ways to achieve your end goals - what are your reasons for needing this?

KOLANICH commented 2 years ago

I use ANTLR4 as a library from my own tools. In multiple ways (the other way I use ANTLR is using its internal representation of grammars, but it has nothing to do with this issue), one of which is to compile a grammar in-memory (the grammar for ANTLR is itself transpiled from my own DSL for grammars, allowing them to be portable across different parser generators with own advantages and drawbacks, for example ANTLR has a benefit that it is convenient to debug grammars in it). Then multiple things can happen.

  1. the generated java sources are passed into in-memory Java compiler and compiled into Java bytecode and the resulting classes are fed into a test rig (basically I type UniGrammar viz antlr4 grammar.yug string-to-parse and get the test rig GUI with AST)
  2. the generated python sources are evaled and run through my wrapper
  3. the generated java sources are compiled and run through my wrapper (I use the awesome JPype lib (through my abstraction layer JAbs) to interface java code from python)
  4. the generated files are saved into the places my tool expects them to find (yet another abstraction I call a ParserBundle)

I don't want to touch SSD/HDD when it is completely unnecessary. Of course I can allow ANTLR to save files to some dir, and then read them from it, and you can say that it is a way to achieve the end goals of compiling a file, but if we add "without doung unnecessary I/O when it is unneeded", then it turns out that goal is unachievable without much very dirty workarounds like using ramfs.

IMHO this issue should better be fixed on ANTLR side by providing the needed API.

ericvergnaud commented 2 years ago

Hi,

I just had a quick look at your PR, and I’m not sure your proposed approach is optimal.

Rather than changing the Tool API, I’d suggest subclassing the CodeGenerator file, and have the subclass being used instead of the default. I believe this only requires a minor change in CodeGenPipeline, and has better chances to make it. Antlr4 has been around for 30 years, so nobody controls how it’s being used. Any breaking change is legitimately rejected.

I can’t guarantee that the above will make it to master, but if it doesn’t it will certainly reduce the maintenance work on your side.

Le 8 sept. 2022 à 11:14, KOLANICH @.***> a écrit :

It'd be nice to have a capability to transpile grammars into source code in-memory, without touching storage, via API.

I have even created a PR about that (#2774 https://github.com/antlr/antlr4/pull/2774), but it has been rejected because of "no big changes" (I had to do a bit of refactoring in order to support both into-file and in-memory transpilation cleanly enough).

So currently I have to maintain a separate branch (https://github.com/UniGrammar/antlr4/tree/tool_refactoring https://github.com/UniGrammar/antlr4/tree/tool_refactoring) and I'm not happy about that, because sometimes merge conflicts happen and pinpointing why and where exactly they happen and how to fix that and reapplying the changes manually without any good semantic AST-based version control system is hard and time-consuming. Also I'm not happy with shipping an own version of ANTLR that has no chance to land into distros.

Could someone in the project examine my changes and integrate them into the code base (or give me an advice on what I should do) to get them merged?

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3874, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJFAZRAL4K32TEPW5I3V5GU6PANCNFSM6AAAAAAQHQYTMY. You are receiving this because you are subscribed to this thread.

KvanTTT commented 2 years ago

Antlr4 has been around for 30 years

To be precise, Antlr4 has been existing since 2012 if consider the first release and since 2010 if consider the first commit. So max is 12 years.

jimidle commented 2 years ago

Well, we used to do it on a private perforce server. Predated GitHub by decades ;). I left for Taiwan in 2010 and had been working on ANTLR long before that. And Ter long before me! :)

On Tue, Oct 11, 2022 at 21:09 Ivan Kochurkin @.***> wrote:

Antlr4 has been around for 30 years

To be precise, Antlr4 has been existing since 2012 if consider first release and since 2010 if consider first commit. So max 12 years.

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3874#issuecomment-1274662713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMHTA4S6XF7T2BDPOATWCVRHVANCNFSM6AAAAAAQHQYTMY . You are receiving this because you commented.Message ID: @.***>

KvanTTT commented 2 years ago

Well, we used to do it on a private perforce server. Predated GitHub by decades ;). I left for Taiwan in 2010 and had been working on ANTLR long before that. And Ter long before me! :)

I'm not arguing, but it looks like they were previous versions of ANTLR, not 4.

KOLANICH commented 2 years ago

@ericvergnaud, thanks for the advice, partly done, but still a lot of has to be done to minimize the changes in other parts.