benbrandt / text-splitter

Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
MIT License
235 stars 15 forks source link

Use tree-sitter 0.20.2 instead of 0.22.6 #202

Closed boxbeam closed 2 months ago

boxbeam commented 2 months ago

Rationale: This is being used for https://github.com/TabbyML/tabby. We have a list of supportive languages, for which we are using many different tree-sitter-<language> crates. The underlying tree-sitter version we are depending on is 0.20.2, and some of the language crates we depend on do not support 0.22.6. For example, tree-sitter-java supports up to 0.21.*.

benbrandt commented 2 months ago

@boxbeam thanks for reaching out. I'm just a little confused, it seems to me that tree-sitter-java works fine with 0.22 https://github.com/tree-sitter/tree-sitter-java/blob/master/Cargo.toml#L24

I am using tree-sitter-rust with the same version bounds and it works fine.

Have you tried upgrading and run into issues?

The reason why I ask, is because the depth calculation you added may become quite expensive if there are a lot of nodes, and I already think the performance is slower than I'd like at the moment. I could go back to a manual depth calculation I had in an earlier version, but I just want to verify that 0.22 really doesn't work for 0.21 grammar packages.

boxbeam commented 2 months ago

It may be that an update came through since I last tried. I'll give upgrading everything another shot.

benbrandt commented 2 months ago

Thanks @boxbeam I can also attempt to use the Java grammar on my end as well. But you have the more complicated use of tree-sitter I am sure. If it doesn't work, I may feature flag specific versions of tree-sitter to help unblock you.

boxbeam commented 2 months ago

Was able to get this working without reverting the tree-sitter version, thanks for working with us on this even though we didn't need anything in the end!

benbrandt commented 2 months ago

Awesome! And not a problem! Please let me know how the quality is. I'm trying to do some testing myself before release, but I can also cut a release if that helps you with testing.

benbrandt commented 2 months ago

@boxbeam I made a new release last night with the latest code. Hopefully this makes it easier to test on your end with an actual release. Thanks for testing it out!

boxbeam commented 2 months ago

Thank you, we'll be using it!