Add `parse` package with new AST structure

chrispcampbell commented 11 months ago

As described in the recent PR #381 that introduced more tests for model parsing/reading and code generation, I've been working on refactoring those phases of the compiler so that they work with a generic AST structure instead of directly depending on the antlr4-vensim library. There are a few different motivations for this work:

Removing the direct dependency on antlr4-vensim from the existing model reading and code gen modules makes it much easier to support other input formats, primarily XMILE (as used by Stella Architect).
Making code gen mostly input-format agnostic will make it possible (and easier) to support other output formats besides C. For example, I am interested in adding pure JS as an output format (can rely on much of the existing C code gen for this), which will make it easier to get up and running with SDE without the Emscripten C to WebAssembly compilation step. We have also considered generating WebGPU code, which will be easier to explore once we remove the Vensim dependency in code gen.
Working with the AST can be much easier than implementing reading and code gen using the ANTLR visitor pattern. Currently to add support for a new Vensim function, it is sometimes necessary to spread the implementation over a few different visit methods, which means that class instance-level state is needed to share information between those methods, which can be hard to understand and keep track of. With the AST structure, all of that logic and state can generally be maintained in one place, making the code more readable and easier to maintain.
Using an AST structure also makes it easier to implement compile-time optimizations, such as more aggressive constant folding and dead code elimination. (I implemented a simple version of this optimization in #102, but it could only easily analyze simple expressions. With the AST, I was able to implement a more aggressive algorithm that is capable of reducing a broader set of expressions.)
There are other types of tools (not related to the compiler) that can benefit from the AST structure. For example, the hyperlinked documentation tool I wrote for the En-ROADS Technical Reference guide (see this Model Equations section) uses an earlier version of the parse package that I'm proposing here.

I have a branch available that adds a new @sdeverywhere/parse package (with the new AST types) and updates the existing compiler code to use new code paths (disabled by default) that use the parse package.

To avoid introducing lots of changes all at once, I'm proposing to make the changes in a few steps:

This first issue/PR will add the parse package to the repo (no changes to the compile package yet).
Once the initial work is merged, I will file a second issue/PR that includes the changes to the compile package, but disabled by default (can be enabled with a hidden environment variable to allow for testing). The goal for this new code is that it is 100% compatible with the existing code, i.e., it generates the same code as the old implementation.
Once we are satisfied with the new code, I will file a third issue/PR to make the new code the default and remove the old implementation.

(None of these steps should result in any behavior changes for users of SDEverywhere.)

ToddFincannonEI commented 11 months ago

I'm looking forward to trying this out on EPS in step 2. I will be curious to see how the code generation changes. The need to parse parts of the model repeatedly and to build up the code gen in several parts of the parse tree are the hardest parts of adding new features for me.

It would be great if we could target pure JS and drop Emscripten from the tool chain. We would only be able to do that if there was no significant performance hit. That's becoming a limiting factor for us as the model grows. I remember asm.js that came before Emscripten did all kinds of tricks to get the JS running fast, so I'm thinking a pure JS target won't be simple to implement. But first things first!

chrispcampbell commented 11 months ago

I'm looking forward to trying this out on EPS in step 2. I will be curious to see how the code generation changes.

If the tests are any indication, the new code gen should produce 100% identical code to that of the old (excepting maybe for slight differences in the constant conditional optimization). But we can evaluate that in the next round.

It would be great if we could target pure JS and drop Emscripten from the tool chain. We would only be able to do that if there was no significant performance hit.

Yeah, I see the pure JS target being a good default output format with acceptable performance for smaller models, but likely too slow to use in production for larger models like En-ROADS and EPS. In my initial evaluation, it looks like the JS code gen can reuse most of the C code gen, and the more time consuming part will be porting over the C runtime parts to JS. But like you say, first things first!

climateinteractive / SDEverywhere

Add `parse` package with new AST structure #405