Closed cgr71ii closed 2 years ago
To avoid this problem, use gensim
branch.
Ahhh, ok, ty! Is it intended to merge this branch to master or they will keep separate?
Right now Gensim is producing much lower quality embeddings, will merge it when a decent quality is reached.
Even using gensim
branch the installation fails with python==3.10 (I have tested python==3.9 as well and it seems to work fine):
Processing /home/cgarcia/Documentos/bicleaner-ai
Preparing metadata (setup.py) ... done
Collecting scikit-learn>=0.22.1
Using cached scikit_learn-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB)
Requirement already satisfied: PyYAML>=5.1.2 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (6.0)
Requirement already satisfied: numpy in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (1.22.3)
Collecting pytest
Using cached pytest-7.1.1-py3-none-any.whl (297 kB)
Collecting toolwrapper
Using cached toolwrapper-2.1.0-py3-none-any.whl
Requirement already satisfied: joblib in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (1.1.0)
Requirement already satisfied: sacremoses in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (0.0.49)
Collecting bicleaner-hardrules>=2.0
Using cached bicleaner_hardrules-2.0-py3-none-any.whl (34 kB)
Collecting sentencepiece
Using cached sentencepiece-0.1.96-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
Collecting tensorflow>=2.3.2
Using cached tensorflow-2.8.0-cp310-cp310-manylinux2010_x86_64.whl (497.6 MB)
Collecting fuzzywuzzy
Using cached fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Collecting python-Levenshtein
Using cached python_Levenshtein-0.12.2-cp310-cp310-linux_x86_64.whl
Collecting transformers==4.10.3
Using cached transformers-4.10.3-py3-none-any.whl (2.8 MB)
Collecting psutil
Using cached psutil-5.9.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (281 kB)
Requirement already satisfied: gensim>=4 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (4.1.2)
Requirement already satisfied: tqdm>=4.27 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from transformers==4.10.3->bicleaner-ai==1.0.1) (4.63.1)
Collecting requests
Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB)
Collecting filelock
Using cached filelock-3.6.0-py3-none-any.whl (10.0 kB)
Collecting packaging
Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting huggingface-hub>=0.0.12
Using cached huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
Collecting tokenizers<0.11,>=0.10.1
Using cached tokenizers-0.10.3.tar.gz (212 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: regex!=2019.12.17 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from transformers==4.10.3->bicleaner-ai==1.0.1) (2022.3.15)
Requirement already satisfied: fastspell in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-hardrules>=2.0->bicleaner-ai==1.0.1) (0.1.5)
Requirement already satisfied: fasttext in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-hardrules>=2.0->bicleaner-ai==1.0.1) (0.9.2)
Requirement already satisfied: smart-open>=1.8.1 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from gensim>=4->bicleaner-ai==1.0.1) (5.2.1)
Requirement already satisfied: scipy>=0.18.1 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from gensim>=4->bicleaner-ai==1.0.1) (1.8.0)
Collecting threadpoolctl>=2.0.0
Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting libclang>=9.0.1
Using cached libclang-13.0.0-py2.py3-none-manylinux1_x86_64.whl (14.5 MB)
Collecting tf-estimator-nightly==2.8.0.dev2021122109
Using cached tf_estimator_nightly-2.8.0.dev2021122109-py2.py3-none-any.whl (462 kB)
Collecting keras<2.9,>=2.8.0rc0
Using cached keras-2.8.0-py2.py3-none-any.whl (1.4 MB)
Collecting wrapt>=1.11.0
Using cached wrapt-1.14.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
Requirement already satisfied: six>=1.12.0 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from tensorflow>=2.3.2->bicleaner-ai==1.0.1) (1.16.0)
Collecting opt-einsum>=2.3.2
Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting protobuf>=3.9.2
Using cached protobuf-3.19.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
Collecting typing-extensions>=3.6.6
Using cached typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Requirement already satisfied: setuptools in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from tensorflow>=2.3.2->bicleaner-ai==1.0.1) (61.2.0)
Collecting tensorboard<2.9,>=2.8
Using cached tensorboard-2.8.0-py3-none-any.whl (5.8 MB)
Collecting grpcio<2.0,>=1.24.3
Using cached grpcio-1.44.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
Collecting termcolor>=1.1.0
Using cached termcolor-1.1.0-py3-none-any.whl
Collecting flatbuffers>=1.12
Using cached flatbuffers-2.0-py2.py3-none-any.whl (26 kB)
Collecting google-pasta>=0.1.1
Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting keras-preprocessing>=1.1.1
Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting astunparse>=1.6.0
Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting gast>=0.2.1
Using cached gast-0.5.3-py3-none-any.whl (19 kB)
Collecting absl-py>=0.4.0
Using cached absl_py-1.0.0-py3-none-any.whl (126 kB)
Collecting h5py>=2.9.0
Using cached h5py-3.6.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1
Using cached tensorflow_io_gcs_filesystem-0.24.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.1 MB)
Collecting tomli>=1.0.0
Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting py>=1.8.2
Using cached py-1.11.0-py2.py3-none-any.whl (98 kB)
Collecting iniconfig
Using cached iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
Collecting pluggy<2.0,>=0.12
Using cached pluggy-1.0.0-py2.py3-none-any.whl (13 kB)
Collecting attrs>=19.2.0
Using cached attrs-21.4.0-py2.py3-none-any.whl (60 kB)
Requirement already satisfied: click in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from sacremoses->bicleaner-ai==1.0.1) (8.0.4)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from astunparse>=1.6.0->tensorflow>=2.3.2->bicleaner-ai==1.0.1) (0.37.1)
Collecting pyparsing!=3.0.5,>=2.0.2
Using cached pyparsing-3.0.7-py3-none-any.whl (98 kB)
Collecting markdown>=2.6.8
Using cached Markdown-3.3.6-py3-none-any.whl (97 kB)
Collecting tensorboard-plugin-wit>=1.6.0
Using cached tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
Using cached tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
Collecting google-auth-oauthlib<0.5,>=0.4.1
Using cached google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting google-auth<3,>=1.6.3
Using cached google_auth-2.6.2-py2.py3-none-any.whl (156 kB)
Collecting werkzeug>=0.11.15
Using cached Werkzeug-2.0.3-py3-none-any.whl (289 kB)
Collecting idna<4,>=2.5
Using cached idna-3.3-py3-none-any.whl (61 kB)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from requests->transformers==4.10.3->bicleaner-ai==1.0.1) (1.26.9)
Requirement already satisfied: certifi>=2017.4.17 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from requests->transformers==4.10.3->bicleaner-ai==1.0.1) (2021.10.8)
Collecting charset-normalizer~=2.0.0
Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Requirement already satisfied: hunspell in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from fastspell->bicleaner-hardrules>=2.0->bicleaner-ai==1.0.1) (0.5.5)
Requirement already satisfied: pybind11>=2.2 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from fasttext->bicleaner-hardrules>=2.0->bicleaner-ai==1.0.1) (2.9.1)
Collecting cachetools<6.0,>=2.0.0
Using cached cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting rsa<5,>=3.1.4
Using cached rsa-4.8-py3-none-any.whl (39 kB)
Collecting pyasn1-modules>=0.2.1
Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting requests-oauthlib>=0.7.0
Using cached requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting pyasn1<0.5.0,>=0.4.6
Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting oauthlib>=3.0.0
Using cached oauthlib-3.2.0-py3-none-any.whl (151 kB)
Building wheels for collected packages: bicleaner-ai, tokenizers
Building wheel for bicleaner-ai (setup.py) ... done
Created wheel for bicleaner-ai: filename=bicleaner_ai-1.0.1-py3-none-any.whl size=54794 sha256=3f1a9abb239729cabfb545ab511bae8f6138ef785db846c0d18902788427a7fe
Stored in directory: /home/cgarcia/.cache/pip/wheels/99/de/a6/8387dd68f2bf77fc1db659727c6ced28aca8212a3a24b198ed
Building wheel for tokenizers (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for tokenizers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [51 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.10
creating build/lib.linux-x86_64-3.10/tokenizers
copying py_src/tokenizers/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers
creating build/lib.linux-x86_64-3.10/tokenizers/models
copying py_src/tokenizers/models/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/models
creating build/lib.linux-x86_64-3.10/tokenizers/decoders
copying py_src/tokenizers/decoders/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/decoders
creating build/lib.linux-x86_64-3.10/tokenizers/normalizers
copying py_src/tokenizers/normalizers/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/normalizers
creating build/lib.linux-x86_64-3.10/tokenizers/pre_tokenizers
copying py_src/tokenizers/pre_tokenizers/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/pre_tokenizers
creating build/lib.linux-x86_64-3.10/tokenizers/processors
copying py_src/tokenizers/processors/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/processors
creating build/lib.linux-x86_64-3.10/tokenizers/trainers
copying py_src/tokenizers/trainers/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/trainers
creating build/lib.linux-x86_64-3.10/tokenizers/implementations
copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
copying py_src/tokenizers/implementations/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
creating build/lib.linux-x86_64-3.10/tokenizers/tools
copying py_src/tokenizers/tools/visualizer.py -> build/lib.linux-x86_64-3.10/tokenizers/tools
copying py_src/tokenizers/tools/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/tools
copying py_src/tokenizers/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers
copying py_src/tokenizers/models/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/models
copying py_src/tokenizers/decoders/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/decoders
copying py_src/tokenizers/normalizers/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/normalizers
copying py_src/tokenizers/pre_tokenizers/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/pre_tokenizers
copying py_src/tokenizers/processors/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/processors
copying py_src/tokenizers/trainers/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/trainers
copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.linux-x86_64-3.10/tokenizers/tools
running build_ext
running build_rust
error: can't find Rust compiler
If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
To update pip, run:
pip install --upgrade pip
and then retry package installation.
If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers
Successfully built bicleaner-ai
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Have you tried to install a Rust compiler?
I have tried it, but it is not only the Rust compiler dependency what is missing. The problem is related to huggingface tokenizers
, which now is using Rust. I have needed to run the following commands to be able to build Bicleaner AI gensim
branch (conda build, not environment, but it should be needed as well):
conda install -c conda-forge rust # rust compiler
pip install setuptools_rust
The dependency setuptools_rust
can't be directly added to the requirements.txt
file since it needs to be installed before execute setup.py
in order to install the Bicleaner AI dependencies (just the same topic that the installation of setuptools
, which is automatically installed in every python/conda environment).
If I'm not wrong, these dependencies should be added to the documentation.
This problem should be temporary, since current bicleaner-ai has HF Transformers version freezed at 4.10 (older than Python 3.10) an that's why precompiled tokenizers is missing. Once Transformers is updated it shouldn't appear.
Ahhh, understood! Hadn't thought about that.
Installation:
Log:
The problem seems to be related to
glove-python-binary
and the python version. I have tried with python 3.7, 3.8, 3.9 and 3.10. It seems thatglove-python-binary
can't be installed with python>=3.9.