github / semantic

Parsing, analyzing, and comparing source code across many languages
8.94k stars 454 forks source link

Reconsider AOT vs. compile-time generation of syntax trees #622

Closed patrickt closed 3 years ago

patrickt commented 4 years ago

Our current formulation of syntax trees assumes that we’ll be able to read the contents of node-types.json files at compile time. This is only true for local development, and files pulled in via pinned Git dependencies. For all other cases, the official word is that this is not expected to work. This means that any future publishing to Hackage is off the table, though things work for local dev and our downstream dependent projects.

But even the situation as it stands is not a hugely optimal one. For example, though Bazel tends to provide better in-IDE tooling, it doesn’t know how to find node-types files in REPLs, and even during standard builds doesn’t know how to find them without preprocessor trickery.

I think it’s time to consider whether generation of this code ahead-of-time is worth exploring. Here are some upsides and downsides of AOT code generation.

Upsides

Downsides

Another approach we could take is to drop cabal support entirely, which would also preclude any Hackage releases, still needs some love to get working in a REPL context, and would entail a degree of tediousl downstream changes. We could also shudder download the grammar definitions in the TH splices themselves, but I hardly think that invoking network calls in TH is something we should encourage, though that’s the only way I can envision this possibly working with cabal.

patrickt commented 4 years ago

Good news: with a little elbow grease, we can reuse @aymannadeem’s Template Haskell work here, since it’s possible to run the Q monad from IO. That means that codegen should be as simple as pretty-printing the result of running astDeclarationsForLanguage, with appropriate module headers, imports, and LANGUAGE pragmas. Exciting!