Reconsider AOT vs. compile-time generation of syntax trees

Our current formulation of syntax trees assumes that we’ll be able to read the contents of node-types.json files at compile time. This is only true for local development, and files pulled in via pinned Git dependencies. For all other cases, the official word is that this is not expected to work. This means that any future publishing to Hackage is off the table, though things work for local dev and our downstream dependent projects.

But even the situation as it stands is not a hugely optimal one. For example, though Bazel tends to provide better in-IDE tooling, it doesn’t know how to find node-types files in REPLs, and even during standard builds doesn’t know how to find them without preprocessor trickery.

I think it’s time to consider whether generation of this code ahead-of-time is worth exploring. Here are some upsides and downsides of AOT code generation.

Upsides

As mentioned above, this basically only works on cabal due to implementation details of the build/REPL process.
We already do AOT codegen for the Semantic_Proto serialization files. Note that that file, even though it comes out to like 8000 SLoC, is well-behaved re. compile time and IDE support, in contrast to our stuff that does complicated Template Haskell splices. Indeed, I anticipate that the authors of proto-lens avoided TH generation because, much like us, TH has difficulty finding .proto files, and needs to work with massive protobuf definitions.
We also generate code for lingo-haskell.
As mentioned above, our build process can become substantially simpler, our IDE tooling will work more reliably (because it won’t ever try to activate a TH splice).
We don’t update the grammars super-often, so this shouldn’t institute a tremendous amount of code churn.
Better caching (even with Bazel, which is much better at caching than cabal, we still encounter spurious rebuilds).
Better project ergonomics (since the codegen splices are defined in tree-sitter).

Downsides

More code to write.
Less elegant than a pure-TH solution.
It’s an extra step we have to be aware of during the update process.

Another approach we could take is to drop cabal support entirely, which would also preclude any Hackage releases, still needs some love to get working in a REPL context, and would entail a degree of tediousl downstream changes. We could also shudder download the grammar definitions in the TH splices themselves, but I hardly think that invoking network calls in TH is something we should encourage, though that’s the only way I can envision this possibly working with cabal.

github / semantic

Reconsider AOT vs. compile-time generation of syntax trees #622

Upsides

Downsides