Closed clason closed 6 months ago
Thanks for the heads up.
Do you have references to share for "the correct repo layout"? Happy to snap to some sort of spec if there is one.
Requiring PR authors to generate grammar.json
seems like it would still lead to merge conflicts and bloat the diff (albeit not as much). I was not aware of upstream making a meaningful distinction between commit-safe generated files and non-commit-safe ones. If there's been new thinking on that then I would love to read through the discussion, if any.
It's just the repo layout; the rationale is that downstream does not need to differentiate between a git clone and a release download -- that's how every other source tarball on Github works.
I don't think conflicts are going to be much of an issue if you are not worrying about grammar.js conflicts; same for churn (grammar.json is really small compared to parser.c; almost the same order of magnitude as the JS.)
I'll see if I can find anything on Github; there's been a lot of discussion on Discord/Matrix.
It's just the repo layout; the rationale is that downstream does not need to differentiate between a git clone and a release download
Got it; so you are asking that, when publishing a GitHub release, I should additionally include a zip file with both the source-controlled contents of the repository, and the generated files in the directory that tree-sitter generate
would have put them?
that's how every other source tarball on Github works.
I appreciate your patience in working out the details of something that seems so obvious!
Got it; so you are asking that, when publishing a GitHub release, I should additionally include a zip file with both the source-controlled contents of the repository, and the generated files in the directory that tree-sitter generate would have put them?
Yes; basically tree-sitter generate
and then zip up everything -- parser, bindings, whatnot; everything needed to use the parser without node
or tree-sitter
. Check parsers under tree-sitter
or tree-sitter-grammars
.
I see that my comments came across much more unfriendly than they were intended, so let me summarize and explain a bit more.
Usually, a release tarball is an exact snapshot of the repository at the time of release; the rationale is that downstream can use automated archives (that Github generates for every commit and tag) and release sources interchangeably. Of course, releases can add generated files that allow building without advanced developer tooling (insert xz joke here). This is especially important for us since we need to build ~300 parsers with one codepath, and we simply can't make exceptions for individual parsers. This means that we currently simply can't use the release source archive, even if we wanted (which we do, since we want to get away from tracking every commit of every parser, especially for stable parsers like this one).
The idea of using grammar.json
for tree-sitter generate
is a) avoiding the use of node
if the grammar imports other files (especially via npm install
) and b) speeding up generation since the Javascript doesn't have to be interpreted. I saw significant benefits on some parsers, but testing this now with the latest CLI don't see much difference on this one; since you don't use any JS shenanigans, either, that point is moot so you can ignore that request.
Regarding upstream support: https://github.com/tree-sitter/workflows has a bunch of workflows for parsers you can just use (these will be expanded as the publish and release support is added to tree-sitter
CLI); https://github.com/tree-sitter-grammars/template is a template you can see the use of these workflows in.
While keeping
parser.c
out of version control is reasonable (and in fact recommended by upstream), the generatedgrammar.json
is much smaller and has meaningful diffs. The benefit of keeping this in the repo is that downstream users (like nvim-treesitter) do not requirenode
for building and can just dotree-sitter g src/grammar.json && tree-sitter build -o parser.so
(which is what nvim-treesitter is transitioning to for 1.0).In addition, consider making proper releases including the generated artifacts in the correct repo layout (which is what nvim-treesitter will require for "stable" parsers). Upstream is in the process of adding reusable workflows and
tree-sitter
support to make this easier.This will allow you to drop the
generated_files
branch.