antlr / antlr5

BSD 3-Clause "New" or "Revised" License
57 stars 5 forks source link

Integrate Kotlin runtime #1

Closed ericvergnaud closed 9 months ago

ericvergnaud commented 10 months ago

We will be using Kotlin to generate the WebAssembly runtime. There is a need to integrate it from Strumenta's repo. There is also a need to ensure it supports the same level of testing as the existing Java runtime. (we don't want to embark on WebAssembly without checking that the runtime is 100% compatible)

ericvergnaud commented 10 months ago

@KvanTTT would you be comfortable looking into this ?

KvanTTT commented 10 months ago

Yes, I'd like to do that at the next weekend or later. But probably I need maintainer role to make possible to do commits into the repository (I'm not sure that GitHub correctly supports non-standard PR that contains merge from unrelated histories and I'd like to keep both histories).

ericvergnaud commented 10 months ago

@KvanTTT I've given you 'write' access, let me know if any issue

ericvergnaud commented 10 months ago

Please merge to 'dev' branch

KvanTTT commented 10 months ago

Please merge to 'dev' branch

Ok.

KvanTTT commented 10 months ago

I've investigated Strumenta's kotlin runtime a bit and concluded, that it's unlikely I will integrate the full repository with its history because of the following points:

So, I think the best way is about using J2K converter on the current codebase with copying some files from Strumenta's repository (mostly files related to code-generations, Kotlin.stg and related). Also, we can do it file-by-file without breaking build and tests.

lppedd commented 10 months ago

Addressing the points.

This repo is completely uncovered by ANTLR runtime unit-tests

We can add all those tests, by replicating the infrastructure.
The same kind of work was done by me for integration tests of some grammars (under the antlr-kotlin-tests module), that is, I've replicated the antlr4test-maven-plugin strategy, just using Gradle and the KMP kotlin.Test functionalities.

It obviously is a task that will take time, and I'm not getting paid to do it, so I can definitely look into it - but I can't dedicate hours everyday.

that are not supposed to be in codebase

They had to be placed somewhere as we have no access to the ANTLR4 repo testing infra.
They're not shipped obviously, but they're nice to have to verify the generated code compiles correctly.
We've found two compiler (K1) bugs doing this.

It uses expect/actual declarations (MPP) functionality that is supposed to work with Gradle build system

That may be interpreted as a blocking issue, but my personal suggestion is to replace the Maven infra with Gradle, or try to call Gradle through Maven. Again it's not something that takes a day to solve, but trying to swipe away the Gradle foundation for Multiplatform seems like a major step back.

So, I think the best way is about using J2K converter on the current codebase with copying some files from Strumenta's repository

This is exactly what @ftomassetti did in the beginning (was it in 2017?).
In subsequent PRs starting December 2023 I've cleaned up the source code, but it took weeks and it's still not completely done. Doing the exact same thing again seems like wasting a lot of time.

KvanTTT commented 10 months ago

It obviously is a task that will take time, and I'm not getting paid to do it, so I can definitely look into it - but I can't dedicate hours everyday.

Thanks for that. BTW, I have the same situation.

They had to be placed somewhere as we have no access to the ANTLR4 repo testing infra.

We can set up testing on the whole grammars-v4 repository instead of testing on several grammars. But we can do it later.

We've found two compiler (K1) bugs doing this.

Yes, it's possible. I can't say if we fixed them in K2 because, unfortunately, some behavior remains "buggy" in favor of back-compatibility. But I can take look at them if you share.

That may be interpreted as a blocking issue, but my personal suggestion is to replace the Maven infra with Gradle, or try to call Gradle through Maven.

We definitely can do it but later. But I prefer getting rid of incorporating several big changes into one big step. Instead, I suggest porting Kotlin runtime at first and making sure that all tests pass. Later we can switch to Gradle if it's needed.

This is exactly what @ftomassetti did in the beginning (was it in 2017?). In subsequent PRs starting December 2023 I've cleaned up the source code, but it took weeks and it's still not completely done. Doing the exact same thing again seems like wasting a lot of time.

Makes sense. But I suggest using the official repo codebase as a foundation, but not vise verse.

ftomassetti commented 10 months ago

Thank you @KvanTTT for investigating this. Is great to have someone with your expertise to look into this

I've investigated Strumenta's kotlin runtime a bit and concluded, that it's unlikely I will integrate the full repository with its history because of the following points:

  • This repo is completely uncovered by ANTLR runtime unit-tests (descriptors). It's unclear how many tests are really broken if this repository is ported.

Would it be difficult to check if there are broken tests and how many? How much effort do you think it would take, not to fix the tests but just to check how bad the situation would be?

From the other side, it contains kind of integrational tests on big grammars that are not supposed to be in codebase.

I think removing them after merging/importing/copying would be easy and I can do that

  • It uses expect/actual declarations (MPP) functionality that is supposed to work with Gradle build system. Currently ANTLR relies on Maven. Also, supporting such functionalify needs additional configuration effort.

Now, that is the part that concern me the most. My understanding is that expect and actual are the only mechanism you can use to write platform specific code and accomodate for the very few specificities of the different platforms. In the whole ANTLR Kotlin project there are very few classes that use that mechanism. Exclusively the ones where there were platform specifities we needed to account for. For example, the Console works differently on Java and JS

I only used Kotlin with Gradle so far, so I did not know that such feature was connected to the build system. If I understand correctly you are saying that a) expect/actual work only when compiling with gradle and not when compiling with maven b) ANTLR relies on Maven so we should make compilation work with Maven. If my understanding is correct, then this means you will need to write code that can be compiled on all platforms with no possibilities to deal with the very limited but present quirks of the specific platforms. But I am not sure if that is possible. How would you handle that?

  • IntelliJ's Java to Kotlin converter works quite good. It makes some mistakes which need further correction but not so many.

So, I think the best way is about using J2K converter on the current codebase with copying some files from Strumenta's repository (mostly files related to code-generations, Kotlin.stg and related). Also, we can do it file-by-file without breaking build and tests.

Yes, the J2K converted works great, otherwise the initial version of ANTLR Kotlin would have taken years to be developed :D It still requires some manual fixes and refactor all the usages of the JVM specific classes to support the multiple platforms. On this point I would agree with @lppedd that you may end up redoing some of the work already done (and verified over the years). @lppedd put quite some effort in polishing up the code (besides other contributions, like updating to the latest ANTLR version).

But I understand that a gradual approach may be more convenient. Perhaps you can copy the files from the ANTLR Kotlin project as needed, doing that incrementally and keeping the build running, at least until you compile only to the JVM.

It obviously is a task that will take time, and I'm not getting paid to do it, so I can definitely look into it - but I can't dedicate hours everyday.

Thanks for that. BTW, I have the same situation.

I think we are on the same boat on this, for this reason I think that whatever approach save us the most work is the only sustainable. You obviously know the ANTLR codebase and the Kotlin ecosystem better of anyone so you are in the best position to assess which approach will get us moving while requiring an effort we can sustain

ericvergnaud commented 10 months ago

Thanks for this, very interesting details. I guess my expectation was that we would simply bring in the tool target, the runtime itself, and fill the gap for the antlr test framework to ensure that the runtime behaves the same as the reference java target. The grammars-v4 repo is an extremely useful source of grammars, but it's not as accurate as the antlr test suite, and does not actually test anything, so I would not recommend going that route. The Kotlin-specific tests from the Kotlin repo are an important topic too, but if the Kotlin runtime doesn't even pass the antlr test, then maybe there isn't much point investing time now in having them run with Maven ? Since we'll need to switch to Gradle in order to generate WebAssembly, maybe that can be dealt with in a 2nd phase ? Personally, I'd rather lose the Git history and the Kotlin-specific tests than redo the J2K migration, but maybe I'm missing something ?

lppedd commented 10 months ago

I think all depends on expectations, and on Kotlin's integration with Maven.
Multiplatform is Gradle only and I don't think that will change anytime soon being how complex the Kotlin Gradle Plugin is.

So my questions are:

  1. Do you want to generate WebAssembly exclusively?

    If the answer is yes, then probably you'll manage to do that with Maven, using the plugin's js goal and compiler options such as -Xwasm and -Xwasm-target=wasm-js/wasm-wasi.

    On a personal note, ditching the Multiplatform capabilities seems like the wrong direction, especially if it's taken only to avoid integrating Gradle.

    I'll also add that the IDE usability should be taken into account. Will it work correctly with that Maven setup?

  2. Are you open to restructure the (mono)repository?

    Being that this repo is a fork of antlr4, there might be a lot of stuff that antlr5 doesn't actually need.
    For my little experience what happens with large repositories is once they're actively maintained they just don't scale back: be it because there is a risk of breakage, or just because the once-probably-unnecessary components are in use.

    So while it's always cool to jump into coding and getting a tangible result out, I'd start by identifying what can be removed, and remove as much as possible. At that point one can assess what each piece does, and if it's easy or not to switch to Gradle as the foundational build tool.

But I suggest using the official repo codebase as a foundation

To answer this point, the upgrade to ANTLR 4.13.1 was done manually. I did match, file by file and line by line, the Java runtime, with the obvious syntactical and type nullability differences, which were also taken into account by looking at the Swift runtime from time to time.

So to conclude, I'd check the repository design first, and then once there is an answer to what can or cannot be done (be it because it's not doable at all, or it's too costly in terms of time), I'd start integrating the Kotlin side by merging and deleting what is found to be redundant/unnecessary.

At that point Strumenta's repo might already have all the required runtime tests in place.

lppedd commented 10 months ago

To expand on the questions, I'd add another couple just to better understand the vision.

there is an opportunity to have just 1 runtime, that will run faster with language hosts such as JavaScript or Python. ANTLR 5 is primarily about that: switching to WebAssembly

That implies languages that will want to run the built WASM artifact will have a standard or third party runtime, right? And also that Kotlin will be able to correctly export the API for those runtimes.

and the 1st version of ANTLR5 will only support TypeScript

What does this mean in relation to Kotlin?

ericvergnaud commented 10 months ago

It means we're using Kotlin to generate WebAssembly. As of writing, GraalVM doesn't yet support WasmGC so we won't be able to run the generated runtime in Kotlin/JVM, rather the only available runtime will be TypeScript, to be run on platforms that support WasmGC (Chrome, Deno, Node coming soon). We will expand gradually to more host languages as their support for WasmGC becomes available.

lppedd commented 10 months ago

Thanks! That clarifies it.
So it's not really about TypeScript per-se, but more about the platforms that run TypeScript (and JavaScript).

I think that while the ecosystem is progressing towards WASM, having our Kotlin codebase compile to other targets would be beneficial. We don't really know how fast the other hosts will get support for WASM, and it's also a big bet in terms of adoption. WASM is the new cool kid in town, and it's going to be like that for quite some time.

If the codebase can grow without having a specific target in mind, it allows maintainers to pursue other strategies in the future, if required.

ericvergnaud commented 10 months ago

Sure, but we already have antlr4 for that, we really want antlr5 to focus on wasm.

lppedd commented 10 months ago

Reasonable.

There is another point I'd like to clarify: currently WASM debugging is not supported in IntelliJ/Fleet, so how are you going to do it? And, will debug work correctly with Maven, in general?

KvanTTT commented 10 months ago

Would it be difficult to check if there are broken tests and how many? How much effort do you think it would take, not to fix the tests but just to check how bad the situation would be?

I don't know how to check it without full integration of runtime.

If my understanding is correct, then this means you will need to write code that can be compiled on all platforms with no possibilities to deal with the very limited but present quirks of the specific platforms. But I am not sure if that is possible. How would you handle that?

I see target-specific code in your repository, fortunately, is not so big. I was thinking about porting runtime without expect/actual (single module) and integrating them later.

Also, since Kotlin K2, expect and actual declarations should be declared in different modules.

KvanTTT commented 10 months ago

Sorry, probably I was a little hasty with conclusions. Currently I'm working on repositories merging and I'm trying to reach compilation and tests running.

I'll demonstate the result later, stay tuned.

ericvergnaud commented 10 months ago

Reasonable.

There is another point I'd like to clarify: currently WASM debugging is not supported in IntelliJ/Fleet, so how are you going to do it? And, will debug work correctly with Maven, in general?

WebAssembly debugging is supported in Chrome, and in Node and Deno (using WebStorm), provided that the compiler generates map files. Is that not the case?

lppedd commented 10 months ago

Also, since Kotlin K2, expect and actual declarations should be declared in different modules.

Is there a YT issue for this? Seems like a "devastating" breaking change to me.

WebAssembly debugging is supported in Chrome, and in Node and Deno

Ok this sounds good enough, although I was mostly referring to in-IDE debugging capatibilites to diagnose issues or debug test cases without switching context.

In my multiplatform applications/libraries I tend to debug using the faster targets, which is most often the JVM, and sometimes JS, since I consider a guarantee from the Kotlin language that common definitions will have the same behavior on all platforms.

lppedd commented 10 months ago

Also, I'm not sure WebStorm supports the Kotlin plugin, since you mentioned it. Last I heard it wasn't planned from JB.
So you'd have an half-working environment.

ericvergnaud commented 10 months ago

@lppedd Let me explain how this works:

Runtime production and parser generation:

Parser execution:

Since we generate library-like code, there is no scenario where we need the ability to debug using the Kotlin plugin, apart maybe from unit testing. Users won't be writing Kotlin code, so they don't need the plugin.

The only critical requirement is to have generated map files for the Kotlin code that we have written or that the tool has generated.

(that said, it would be great if IntelliJ had the ability to debug WebAssembly running in Node or Deno)

lppedd commented 10 months ago

Thanks!

apart maybe from unit testing

Yeah this is probably the part I find myself using a lot. I step through the code often while developing it, but if you think it's not going to be a common occurence, then it's something we don't have to think about (not now at least).

There are two issues to follow.
https://youtrack.jetbrains.com/issue/KTIJ-28387 https://youtrack.jetbrains.com/issue/KTIJ-28388

KvanTTT commented 10 months ago

Is there a YT issue for this? Seems like a "devastating" breaking change to me.

To me it's quite logic change since expect/actual concern is releavant only to multi-modules environment, see https://youtrack.jetbrains.com/issue/KT-55177

lppedd commented 10 months ago

Ahhhh I recall seeing it.

But it shouldn't impact the current antlr-kotlin repository, as actualizations are one per compilation. I've compiled with K2 without issues yesterday.

KvanTTT commented 9 months ago

Please take a look at my request with the current status: https://github.com/antlr/antlr5/pull/6 Also, you can try run it locally (configured kotlinc is required for tests).

sschuberth commented 9 months ago

I hope you guys don't mind me asking here: For someone who's interested in generating Kotlin code from ANTLR grammar in the short term, would the recommendation be to use https://github.com/Strumenta/antlr-kotlin currently, or wait until https://github.com/antlr/antlr5/pull/6 is merged? Probably related question: What's the roadmap for ANTLR 5, I could not find any information about it. And is the release of the K2 compiler / Kotlin 2.0 somehow a requirement for Kotlin code generation?

KvanTTT commented 9 months ago

I hope you guys don't mind me asking here:

Sure, don't hesitate.

For someone who's interested in generating Kotlin code from ANTLR grammar in the short term, would the recommendation be to use https://github.com/Strumenta/antlr-kotlin currently, or wait until https://github.com/antlr/antlr5/pull/6 is merged?

I think it's possible to make this (ANTLR 5) runtime production-ready in short term, but currently it has only JVM-target. Other targets (Native, WA) will be implemented later because it requires writing some expect/actual functions and probably migrating to Gradle build system.

What's the roadmap for ANTLR 5, I could find any information about it.

The current version doesn't have any breaking change, but later they probably will be introduced since there are a lot of things to improve. Generally I like the idea of using this runtime ASAP. So, maybe it makes sense to release an intermediate Kotlin-runtime version without breaking changes ASAP in a separated branch and continue working on futher improvements in the main branch.

I could find any information about it. And is the release of the K2 compiler / Kotlin 2.0 somehow a requirement for Kotlin code generation?

Now it requires K2 only for code-generation. But it's possible to use generated code with a lower version (K1).

ericvergnaud commented 9 months ago

Let me add my 2p. Although we definitely want to have a working antlr5 asap, there is still a lot to do before we can release a version that actually adds something to antlr4 or to the Strumenta fork. And we don't yet know if that version will be WASM only or also JVM... You are more than welcome to play with antlr5 if you're comfortable with frequent breaking changes. If OTOH you are looking for a production version on the short term, you're much better going with the Strumenta fork.

sschuberth commented 9 months ago

Thanks for the comments. I'm not necessarily looking for a production ready solution as I'm considering to only do a one-time conversion of the ANTLR grammar to Kotlin code as the grammar barely changes. So I'm mostly interested in the best generated code I can currently get for Kotlin, and I'm totally fine to massage that code manually after the conversion.

ftomassetti commented 9 months ago

Just to avoid any confusion for people looking at this for the first time: I would not call ANTLR Kotlin a fork but simply a target for ANTLR4. It does not change or replace anything in the original project and it instead relies on ANTLR4. Regarding the maturity of the ANTLR Kotlin target, I personally do not use it in production but several companies (behind several contributors) do, so it has some level of stability/maturity

sschuberth commented 9 months ago

I'm considering to only do a one-time conversion of the ANTLR grammar to Kotlin code

Which brings me to a related quesrtion: Is there currently a stand-alone tool available for this, ideally even a hosted web tool that doesn't require any installation? I.e. I paste ANTLR grammar into a form, choose the target language, press a button, and get the generate code for download in a ZIP or so?

lppedd commented 9 months ago

@sschuberth you need a runtime tho. So there is not much point in doing half the work through an online tool, if you have to set up the runtime dependency anyway.

sschuberth commented 9 months ago

you need a runtime tho.

Which runtime? ANTLR runtime, or Kotlin runtime? If the former, then I probably was under the wrong assumption that the generated code would be self-contained.

lppedd commented 9 months ago

The point is not really the generated code, but the ANTLR runtime capabilities.

The antlr-kotlin repository hosts a true multiplatform project, which is what a Kotlin consumer will want to depend on for his own multiplatform project.

ericvergnaud commented 9 months ago

you need a runtime tho.

Which runtime? ANTLR runtime, or Kotlin runtime? If the former, then I probably was under the wrong assumption that the generated code would be self-contained.

The generated code is not self-contained. It relies on a target-specific runtime.

ericvergnaud commented 9 months ago

I'm considering to only do a one-time conversion of the ANTLR grammar to Kotlin code

Which brings me to a related quesrtion: Is there currently a stand-alone tool available for this, ideally even a hosted web tool that doesn't require any installation? I.e. I paste ANTLR grammar into a form, choose the target language, press a button, and get the generate code for download in a ZIP or so?

antlr5 is precisely about this i.e. web first, bit it's very early days....

kaby76 commented 9 months ago

Which brings me to a related quesrtion: Is there currently a stand-alone tool available for this, ideally even a hosted web tool that doesn't require any installation? I.e. I paste ANTLR grammar into a form, choose the target language, press a button, and get the generate code for download in a ZIP or so?

There isn't a requirement specification document for Antlr5. So, in lieu of that, and to not loose the issue in this closed thread, you should probably add an issue somewhere (maybe antlr5-lab, or antlr5, or antlr5-specs) that describes your specific use case. It should state that you want to input a grammar via a website, click a "generate" button, and downloading a zip of the generated recognizer.