Textualize / textual

The lean application framework for Python. Build sophisticated user interfaces with a simple Python API. Run your apps in the terminal and a web browser.
https://textual.textualize.io/
MIT License
25.12k stars 768 forks source link

Upgrade Textual's tree-sitter version to 0.22.x #4845

Open prurph opened 1 month ago

prurph commented 1 month ago

I discussed this briefly with @darrenburns on Discord: Textual currently uses tree-sitter 0.20.4, and it would be nice to upgrade it, notably for the matches method, that returns a dict of matches, allowing easier navigation when you have several matching instances of a query with sub-matches.

There are two main concerns with ugprading to the latest version, 0.22.x:

  1. It requires Python >=3.9, but Textual is >=3.8. tree-sitter 0.21.x does support 3.8, but since the source of the other issue is a breaking API change in 0.22.x, it's probably better to just jump to the latest version if at all possible. Since tree-sitter is an optional dependency of Textual, I think pyproject.toml lets you specify that an optional dependency has a subset of supported Python versions so maybe that's an option too.
  2. Textual uses tree-sitter-languages to add grammars for use. It is unmaintained, and is incompatible with tree-sitter 0.22; you can no longer instantiate a language by path to the compiled grammar, and that's how tree-sitter-languages get_language works.

    Further, tree-sitter recommends grammar authors release them directly and individually to PyPi, npm, and cargo, rather than having other projects that attempt to bundle together the binaries for many languages. Tree-sitter offers Github workflows to facilitate this, but AFAICT these are newer introductions, and a few grammars do not use them or do not release Python versions.

There are two paths to upgrading that I see:

  1. Use the unofficial replacement for py-tree-sitter-languages, tree-sitter-language-pack. This attempts to provide grammars in bulk, but loads them in a way compatible with the newer tree-sitter API.

    • This could be a benefit as it can provide grammars that aren't released on PyPi, for example Kotlin, and SQL (see below)
    • The downside is reliance on a single, new package, and one that aims to duplicate/circumvent the "official" way to release a grammar
  2. Use languages with grammars that are installable by pip. Here's a list of Textual's built-in languages and their status in that regard:

Language Can pip install? Other Notes
Bash
CSS
Go
HTML
Java
Javascript
JSON
Kotlin Recent open issue to add
Markdown Installable via git url; release may be inflight. See this comment
Python
Regex
Rust
SQL I opened an issue and maintainers responded very quickly!
TOML
YAML

I'm hoping to find some time to see what issues crop up trying to upgrade tree-sitter but thought I would share my findings so far and more importantly ask what the Python version support plan/philosophy is for Textual. Thanks!

github-actions[bot] commented 1 month ago

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

merriam commented 1 month ago

I have noticed these issues:

My usage of TreeSitter in Textual has been limited to self-inspection, e.g., I pull out the DEFAULT_CSS values, comments, and some other items. I'm still working on getting it into the documentation build system. Even for this, I cheat and use the Documents code to actually parse the tree as it worked first and has not been worth making it work.

So far the usage cases for TreeSitter are niche, tools and TextArea. It might be better to remove TreeSitter entirely and have an example or documentation note on how to call it when needed. This is only my opinion.

prurph commented 1 month ago

Hmmm, I think the syntax highlighting in text areas is nice, and, moreover, the ability to parse the content into an arbitrary AST is extremely useful for interacting with the text the user types.

I do agree the availability of the grammars varies by language (since they are independently maintained), but have found the major ones to be well-maintained, and I haven't had issues with the Python bindings when using them with my own parser inside of Textual.

I definitely think keeping tree-sitter as an optional dependency of Textual is the way to go, but removing it entirely would shut out a lot of the TextArea functionality, both natively with out of the box syntax highlighting, and for Textual projects using TextArea. Perhaps it should be "bring your own parser" with instructions on how to pip install the available ones?

My understanding of the recent API changes is that they are motivated at least in part by moving towards grammars installed as dependencies and away from the past of loading the binary .so file directly, hence why languages can no longer be loaded from a file and instead are expected to be imported as modules/crates/etc.

merriam commented 1 month ago

Do you believe TextArea could be equally useful without the built-in TreeSitter, but with detailed instructions?

Can you see a point where Tree-sitter is not a niche capability of Textual? Some killer app?

darrenburns commented 1 month ago

Thanks for investigating @prurph, this is a super helpful write up. I think it answers all of my questions before I asked them! I think the 3.9+ requirement is going to be a deal breaker though, at least for now. We generally support Python versions a little beyond end-of-life.

I think in the future, the "language pack" is may be the way to go in the future too, as a replacement for the current py-tree-sitter-languages module.


@merriam

Do you believe TextArea could be equally useful without the built-in TreeSitter, but with detailed instructions?

Tree-sitter is an optional extra that's only required if you want syntax highlighting in the TextArea. You don't need to install it to use TextArea without highlighting.

Deployments on Tree-sitter often avoid PyPi, and its scrutiny

Maybe I'm misunderstanding this but Textual is pinned to use 0.20.* which comes from PyPI. All of Textual's dependencies, including tree-sitter are available on PyPI.

Tree Sitter is used only for the TextEdit widget. Pygments, a mature regular expression syntax coloring system, is used for the CodeBrowser demonstration.

We don't use pygments in the TextArea widget because it's too slow.

Can you see a point where Tree-sitter is not a niche capability of Textual? Some killer app?

Is syntax highlighting "niche"? I can say with a pretty high degree of certainty if we removed it, we'd have a lot of disappointed users. I use it in a couple of my own apps and know that if it wasn't already integrated with Textual it'd be a real pain.

I have been unable to get the current tree sitter installations to allow a .tccs language definition.

Textual is using an older tree-sitter version. You may have been reading docs for a newer version of tree-sitter.

prurph commented 1 month ago

Thanks for investigating @prurph, this is a super helpful write up. I think it answers all of my questions before I asked them! I think the 3.9+ requirement is going to be a deal breaker though, at least for now. We generally support Python versions a little beyond end-of-life.

Sure thing @darrenburns. That sounds reasonable--3.8 is EOL in October so it makes sense to revisit then. This will also give some time to see if the "replacement" or tree-sitter-languages is still going then, and/or if it makes sense to instead let users bring their own parsers (typically just pip installing them).

prurph commented 1 month ago

Looks like SQL is now available on PyPi! https://github.com/DerekStride/tree-sitter-sql/issues/269

Huge thanks to them for accommodating my request very quickly! 🎉