Open ominfowave opened 2 years ago
We currently offer the following tokenizers
versions:
transformers==4.15.0
)transformers==2.11.0
)Both of these are currently only available for Python 3.8. To change the Python version of your app, see here.
As you can see from its setup.py file, indic-punct pins all of its requirements to specific versions. With packages that do this, it's sometimes possible to get them working by specifying whatever is the closest version available in the Chaquopy repository:
install "indic-punct"
install "torch==1.8.1"
install "torchvision==0.9.1"
install "transformers==4.15.0"
install "tokenizers==0.7.0"
In this case I've used the closest newer version of each requirements, but sometimes you might need to use the closest older one.
Unfortunately, the current version of indic-punct (2.1.4) also has a native requirement which Chaquopy doesn't support at all (pynini). It's possible that one of the older versions of indic-punct doesn't have this requirement, but the release history is confusing (8 releases in one day, and no tags on GitHub), so that's something you'd have to look into by yourself.
See also #608.
We're not planning to update this package in the near future, but if you'd like to try building the new version yourself, follow the instructions here. However, our package build tool doesn't currently have working support for Rust – see #1030 for details.
If anyone else needs a newer version of tokenizers, please click the thumbs up button above, and post a comment explaining why you need it.
Hello, I am trying to use some recent model from transformers which require more recent tokenizer version (transformers 4.23.1 or higher which require tokenizers!=0.11.3,<0.14,>=0.11.1) but as i saw on #608 it seems to be a bit complicated because of rust. I would like to know if there are some update about tokenizers library planned soon.
Sorry, we have no update planned in the near future. But if you'd like to try updating it yourself, see the links in my previous comment.
Our current tokenizers
versions are listed in my comment above. If none of those would work for your project, please post a comment explaining why.
Looks like I also need an updated version of tokenizers package for working with manga-ocr (Requires transformers >= 4.25.0
Failed to install tokenizers<0.15,>=0.14 from https://files.pythonhosted.org/packages/b2/b9/bf025d763bbdd333cb88bedb23426f932c5b4a6ce6f033c498517fad5b90/tokenizers-0.14.1.tar.gz#sha256=ea3b3f8908a9a5b9d6fc632b5f012ece7240031c44c6d4764809f33736534166 (from transformers>=4.25.0->manga-ocr).
I've added my thumbs up and might lo0ok at the instructions to install myself later if I have time.
Thanks – I haven't checked, but you may be able to work around this by using an older version of manga-ocr.
In my case I need version 0.13.3 because it is a requirement of faster-whisper. In case it helps others I have made some progress updating it myself by:
However I am blocked due to the build-wheel.sh script setting
env["_PYTHON_HOST_PLATFORM"] = f"linux_{ABIS[self.abi].uname_machine}"
which overrides sysconfig.get_platform()
returning a value without a dash, thus causing setuptools_rust.build.get_dylib_ext_path
to crash.
I wonder if someone knows the reasoning for setting that env variable and/or the consequences of unsetting it or setting it to a different value that conforms to the usual {osname}-{release}-{machine}
.
I don't remember exactly why we added that variable; you can probably find out from the Git history. But going by the sysconfig.get_platform
documentation, I agree it should use a dash rather than an underscore, but without a version number on Linux.
I needed a module in a more recent version of transformers
, which requires tokenizers>=0.14
.
I tried building a wheel for tokenizers==0.15.2
following this README and met this error:
Not sure how to proceed from here. Any help is appreciated.
This appears to be caused by the --target
option, which is unnecessary because the target is already encoded into the compiler launcher. You'd have to examine the build system to work out how to remove the option, but unfortunately I don't know any more than that.
@mhsmith
Thanks for the hint. I now switched to building inside a docker container, and I'm getting a different error: build-wheel: Error: /workdir/chaquopy/server/pypi/packages/tokenizers/build/0.14.1/cp38-cp38-android_21_arm64_v8a/fix_wheel/tokenizers/tokenizers.so is linked against unknown library 'libstdc++.so.6'
.
Here's the Dockerfile
and docker-compose.yaml
I used, and some other changes to help reproduce the error:
I also found this comment suggesting adding the -stdlib=libstdc++
option, but I'm not sure where to add that.
Hope you can help me solve this error. Thanks!
Sorry, I don't have time to look into this in any detail. But libstdc++.so.6
is a Linux library name which should never appear in an Android build, so this is probably caused by the build using a mixture of Android and Linux elements.
Hey, Hope you are doing well. I am facing issues while trying to pip install anthropic which has a dependency of tokenizer>=0.13. I tried with 0.13 version, but i get the attached errors. Could you please guide as to how we can work around this issue. Regards Divyansh tokenizer.log
You could try using an older version of anthropic. Looking back through the blame of anthropic's pyproject.toml, the last version which didn't require such a new version of tokenizers was anthropic 0.2.10. That came out less than a year ago, but this is obviously a fast-moving package, so I don't know if that would be acceptable for you.
You could try using an older version of anthropic. Looking back through the blame of anthropic's pyproject.toml, the last version which didn't require such a new version of tokenizers was anthropic 0.2.10. That came out less than a year ago, but this is obviously a fast-moving package, so I don't know if that would be acceptable for you.
Luckily, the tokenizer version 0.10.3 has worked with the latest anthropic package so far. I thought to test it regardless of the incompatibility error during build and run, and it worked. Yeah, anthropic older versions are not available to newer users as per their api docs, because of huge changes/improvements in their latest offering "opus". So far so good..
Please add tokenizers version 0.11.1, it is a requirement for some of the latest python modules like indic-punct.