cedricrupb / code_tokenize

Fast tokenization and structural analysis of any programming language
MIT License
43 stars 8 forks source link

Unexpectedly recent/incompatible version of tree-sitter #6

Open penguinland opened 2 days ago

penguinland commented 2 days ago

I installed this with pip install code-tokenize, but when I tried using it I got an AttributeError because tree_sitter.Language has no build_library:

>>> import code_tokenize as ctok
>>> text = "print('hello world')"
>>> tokens = ctok.tokenize(text, lang="python")
WARNING:root:Autoloading AST parser for python: Start download from Github.
WARNING:root:Start cloning the parser definition from Github.
WARNING:root:Compiling language for python
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alan/visual-diff/venv/lib/python3.12/site-packages/code_tokenize/__init__.py", line 65, in tokenize
    parser = ASTParser(config.lang)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alan/visual-diff/venv/lib/python3.12/site-packages/code_ast/parsers.py", line 88, in __init__
    self.lang    = load_language(lang)
                   ^^^^^^^^^^^^^^^^^^^
  File "/home/alan/visual-diff/venv/lib/python3.12/site-packages/code_ast/parsers.py", line 62, in load_language
    return load_language(lang)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/alan/visual-diff/venv/lib/python3.12/site-packages/code_ast/parsers.py", line 57, in load_language
    _compile_lang(source_lang_path, compiled_lang_path)
  File "/home/alan/visual-diff/venv/lib/python3.12/site-packages/code_ast/parsers.py", line 191, in _compile_lang
    Language.build_library(
    ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'tree_sitter.Language' has no attribute 'build_library'

I think this is because tree-sitter removed build_library entirely in version 0.22, earlier this year. This repo requires tree_sitter 0.19.0, but pip installed tree_sitter 0.23.1 instead.

I got things to work by explicitly running pip install tree_sitter==0.21.3 and pip install setuptools, but I still get a FutureWarning that Language(path, name) is deprecated.

Is there a way to update PyPI so it installs compatible versions of tree_sitter and setuptools automatically? More broadly, I wonder whether this can be updated so it works with more recent versions of tree-sitter, without deprecation warnings.

Thanks for taking a look, and please let me know if I can help more!

cedricrupb commented 1 day ago

This is actually related to https://github.com/cedricrupb/code_ast/pull/3 (which automatically installs the right version)

Ultimately, I would like to support the newer tree_sitter versions. Therefore, I am currently a bit hesitant to fix the tree_sitter version.

If you are interested, there is already development version of code_ast (which code_tokenize uses under the hood) which supports the newer version of tree_sitter. However, you would have to install the language bindings yourself.

penguinland commented 1 day ago

Thanks so much! Not sure if I should close this ticket right now, or wait for that PR to be merged first. but I appreciate your work!