CyberShadow / tree-sitter-d

17 stars 6 forks source link

add build files into repository #12

Open takase1121 opened 11 months ago

takase1121 commented 11 months ago

Continuation of #8.

This PR includes most build files into repository.

To address the question from #8:

I don't think it's a great idea to put binaries / artifacts (parser.c) into the source repository, it would probably make more sense to put those into a release archive, e.g. from github actions. Does helix support loading the parser from a release zip file?

It might make sense, but almost all treesitter grammar has artifacts committed into the repository. Here are some:

  1. https://github.com/briot/tree-sitter-ada
  2. https://github.com/tree-sitter/tree-sitter-agda
  3. https://github.com/aheber/tree-sitter-sfapex
  4. https://github.com/tree-sitter/tree-sitter-bash
  5. https://github.com/zwpaper/tree-sitter-beancount
  6. https://github.com/amaanq/tree-sitter-capnp
  7. https://github.com/tree-sitter/tree-sitter-c
  8. https://github.com/tree-sitter/tree-sitter-cpp
  9. https://github.com/tree-sitter/tree-sitter-c-sharp
  10. https://github.com/sogaiu/tree-sitter-clojure
  11. https://github.com/uyha/tree-sitter-cmake
  12. https://github.com/stsewd/tree-sitter-comment

That was the first 12 parsers in https://tree-sitter.github.io/tree-sitter/#parsers, so it is safe to say that a lot of them does this, including the ones from tree-sitter themselves.

The other advantage of this approach is that you don't need Node.js, npm, and tree-sitter-cli just to install a language.

As for binding.gyp, it is needed, because if you read the generated binding.gyp:

{
  "targets": [
    {
      "target_name": "tree_sitter_d_binding",
      "include_dirs": [
        "<!(node -e \"require('nan')\")",
        "src"
      ],
      "sources": [
        "bindings/node/binding.cc",
        "src/parser.c",
        # If your language uses an external scanner, add it here.
      ],
      "cflags_c": [
        "-std=c99",
      ]
    }
  ]
}

You can see that scanner.cc wasn't included!

From https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners:

Then, add another C or C++ source file to your project. Currently, its path must be src/scanner.c or src/scanner.cc for the CLI to recognize it. Be sure to add this file to the sources section of your binding.gyp file so that it will be included when your project is compiled by Node.js and uncomment the appropriate block in your bindings/rust/build.rs file so that it will be included in your Rust crate.

CyberShadow commented 11 months ago

Thanks (and also for researching this)!

takase1121 commented 11 months ago

Could this go into the generated branch?

In other words, do the tools that use tree sitter repositories look for stuff on the master branch, or the default branch? Ideally these would go into an automatically updated branch that is not the default branch seen by people who want to read the implementation or contribute to it.

If it has to be master, then we could swap the branches and move generated to master and master to dev or such.

I'm not sure, but I think most tools unfortunately only reads from master. For my part, I'm totally fine with it on a different branch; Neovim has support for this repository (in its current form) already, helix allows the user to supply a rev key so a branch can be used.

Would you mind updating the documentation to integrate a description of this change?

Ah, I didn't know. I'll look into that.

Do you think it would be possible to update these automatically, i.e. update the generated branch when something is pushed to the non-generated branch?

I've made workflows that could open PRs. I'm sure this is possible but it would require the owner to give GHA write permissions.

takase1121 commented 11 months ago

Just tell me if you want to retarget this PR to generated, I can do so.

CyberShadow commented 11 months ago

Yes please, at the very least it should go there.

Note that the documentation is only on master.

takase1121 commented 11 months ago

I was looking at the different branches - I don't think it can be placed in generated, since generated contains the initial grammar files, which needs to be patched (in master) to get the final results.

I think it might be better to create a new branch to host the final files, or to have the generation part "abstracted out" the repo, so in the end it only has source and the final product.

CyberShadow commented 11 months ago

Aah, sorry, I got confused!

or to have the generation part "abstracted out" the repo, so in the end it only has source and the final product.

I don't think we can do that. The reason for the two branches is to use Git merge to port any manual fixes onto new versions of the generated grammar.

CyberShadow commented 11 months ago

OK, so I think this is good. I'm going to check again to make sure I get the exact same files on my end, as we want the process to be deterministic.