alex-pinkus / tree-sitter-swift

A tree-sitter grammar for the Swift programming language.
MIT License
142 stars 37 forks source link

Consider committing `grammar.json` #398

Closed clason closed 6 months ago

clason commented 7 months ago

While keeping parser.c out of version control is reasonable (and in fact recommended by upstream), the generated grammar.json is much smaller and has meaningful diffs. The benefit of keeping this in the repo is that downstream users (like nvim-treesitter) do not require node for building and can just do tree-sitter g src/grammar.json && tree-sitter build -o parser.so (which is what nvim-treesitter is transitioning to for 1.0).

In addition, consider making proper releases including the generated artifacts in the correct repo layout (which is what nvim-treesitter will require for "stable" parsers). Upstream is in the process of adding reusable workflows and tree-sitter support to make this easier.

This will allow you to drop the generated_files branch.

alex-pinkus commented 7 months ago

Thanks for the heads up.

Do you have references to share for "the correct repo layout"? Happy to snap to some sort of spec if there is one.

Requiring PR authors to generate grammar.json seems like it would still lead to merge conflicts and bloat the diff (albeit not as much). I was not aware of upstream making a meaningful distinction between commit-safe generated files and non-commit-safe ones. If there's been new thinking on that then I would love to read through the discussion, if any.

clason commented 7 months ago

It's just the repo layout; the rationale is that downstream does not need to differentiate between a git clone and a release download -- that's how every other source tarball on Github works.

I don't think conflicts are going to be much of an issue if you are not worrying about grammar.js conflicts; same for churn (grammar.json is really small compared to parser.c; almost the same order of magnitude as the JS.)

I'll see if I can find anything on Github; there's been a lot of discussion on Discord/Matrix.

alex-pinkus commented 7 months ago

It's just the repo layout; the rationale is that downstream does not need to differentiate between a git clone and a release download

Got it; so you are asking that, when publishing a GitHub release, I should additionally include a zip file with both the source-controlled contents of the repository, and the generated files in the directory that tree-sitter generate would have put them?

that's how every other source tarball on Github works.

I appreciate your patience in working out the details of something that seems so obvious!

clason commented 7 months ago

Got it; so you are asking that, when publishing a GitHub release, I should additionally include a zip file with both the source-controlled contents of the repository, and the generated files in the directory that tree-sitter generate would have put them?

Yes; basically tree-sitter generate and then zip up everything -- parser, bindings, whatnot; everything needed to use the parser without node or tree-sitter. Check parsers under tree-sitter or tree-sitter-grammars.

clason commented 6 months ago

I see that my comments came across much more unfriendly than they were intended, so let me summarize and explain a bit more.

  1. Usually, a release tarball is an exact snapshot of the repository at the time of release; the rationale is that downstream can use automated archives (that Github generates for every commit and tag) and release sources interchangeably. Of course, releases can add generated files that allow building without advanced developer tooling (insert xz joke here). This is especially important for us since we need to build ~300 parsers with one codepath, and we simply can't make exceptions for individual parsers. This means that we currently simply can't use the release source archive, even if we wanted (which we do, since we want to get away from tracking every commit of every parser, especially for stable parsers like this one).

  2. The idea of using grammar.json for tree-sitter generate is a) avoiding the use of node if the grammar imports other files (especially via npm install) and b) speeding up generation since the Javascript doesn't have to be interpreted. I saw significant benefits on some parsers, but testing this now with the latest CLI don't see much difference on this one; since you don't use any JS shenanigans, either, that point is moot so you can ignore that request.

  3. Regarding upstream support: https://github.com/tree-sitter/workflows has a bunch of workflows for parsers you can just use (these will be expanded as the publish and release support is added to tree-sitter CLI); https://github.com/tree-sitter-grammars/template is a template you can see the use of these workflows in.