bitextor / bicleaner-ai

Bicleaner fork that uses neural networks
GNU General Public License v3.0
38 stars 4 forks source link

Installation error (glove-python-binary) #11

Closed cgr71ii closed 2 years ago

cgr71ii commented 2 years ago

Installation:

python -m pip install ./bicleaner-ai

Log:

Processing ./bicleaner-ai
  Preparing metadata (setup.py) ... done
Collecting scikit-learn>=0.22.1
  Using cached scikit_learn-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB)
Requirement already satisfied: PyYAML>=5.1.2 in /home/cgarcia/miniconda3/envs/glove-python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (6.0)
Requirement already satisfied: numpy in /home/cgarcia/miniconda3/envs/glove-python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (1.22.3)
Collecting pytest
  Using cached pytest-7.1.1-py3-none-any.whl (297 kB)
Collecting toolwrapper
  Using cached toolwrapper-2.1.0.tar.gz (3.2 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: joblib in /home/cgarcia/miniconda3/envs/glove-python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (1.1.0)
Requirement already satisfied: sacremoses in /home/cgarcia/miniconda3/envs/glove-python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (0.0.49)
Collecting bicleaner-hardrules>=2.0
  Using cached bicleaner_hardrules-2.0-py3-none-any.whl (34 kB)
Collecting sentencepiece
  Using cached sentencepiece-0.1.96-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
Collecting tensorflow>=2.3.2
  Using cached tensorflow-2.8.0-cp310-cp310-manylinux2010_x86_64.whl (497.6 MB)
ERROR: Could not find a version that satisfies the requirement glove-python-binary==0.2.0 (from bicleaner-ai) (from versions: none)
ERROR: No matching distribution found for glove-python-binary==0.2.0

The problem seems to be related to glove-python-binary and the python version. I have tried with python 3.7, 3.8, 3.9 and 3.10. It seems that glove-python-binary can't be installed with python>=3.9.

ZJaume commented 2 years ago

To avoid this problem, use gensim branch.

cgr71ii commented 2 years ago

Ahhh, ok, ty! Is it intended to merge this branch to master or they will keep separate?

ZJaume commented 2 years ago

Right now Gensim is producing much lower quality embeddings, will merge it when a decent quality is reached.

cgr71ii commented 2 years ago

Even using gensim branch the installation fails with python==3.10 (I have tested python==3.9 as well and it seems to work fine):

Processing /home/cgarcia/Documentos/bicleaner-ai
  Preparing metadata (setup.py) ... done
Collecting scikit-learn>=0.22.1
  Using cached scikit_learn-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB)
Requirement already satisfied: PyYAML>=5.1.2 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (6.0)
Requirement already satisfied: numpy in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (1.22.3)
Collecting pytest
  Using cached pytest-7.1.1-py3-none-any.whl (297 kB)
Collecting toolwrapper
  Using cached toolwrapper-2.1.0-py3-none-any.whl
Requirement already satisfied: joblib in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (1.1.0)
Requirement already satisfied: sacremoses in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (0.0.49)
Collecting bicleaner-hardrules>=2.0
  Using cached bicleaner_hardrules-2.0-py3-none-any.whl (34 kB)
Collecting sentencepiece
  Using cached sentencepiece-0.1.96-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
Collecting tensorflow>=2.3.2
  Using cached tensorflow-2.8.0-cp310-cp310-manylinux2010_x86_64.whl (497.6 MB)
Collecting fuzzywuzzy
  Using cached fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Collecting python-Levenshtein
  Using cached python_Levenshtein-0.12.2-cp310-cp310-linux_x86_64.whl
Collecting transformers==4.10.3
  Using cached transformers-4.10.3-py3-none-any.whl (2.8 MB)
Collecting psutil
  Using cached psutil-5.9.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (281 kB)
Requirement already satisfied: gensim>=4 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-ai==1.0.1) (4.1.2)
Requirement already satisfied: tqdm>=4.27 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from transformers==4.10.3->bicleaner-ai==1.0.1) (4.63.1)
Collecting requests
  Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB)
Collecting filelock
  Using cached filelock-3.6.0-py3-none-any.whl (10.0 kB)
Collecting packaging
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting huggingface-hub>=0.0.12
  Using cached huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
Collecting tokenizers<0.11,>=0.10.1
  Using cached tokenizers-0.10.3.tar.gz (212 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: regex!=2019.12.17 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from transformers==4.10.3->bicleaner-ai==1.0.1) (2022.3.15)
Requirement already satisfied: fastspell in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-hardrules>=2.0->bicleaner-ai==1.0.1) (0.1.5)
Requirement already satisfied: fasttext in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from bicleaner-hardrules>=2.0->bicleaner-ai==1.0.1) (0.9.2)
Requirement already satisfied: smart-open>=1.8.1 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from gensim>=4->bicleaner-ai==1.0.1) (5.2.1)
Requirement already satisfied: scipy>=0.18.1 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from gensim>=4->bicleaner-ai==1.0.1) (1.8.0)
Collecting threadpoolctl>=2.0.0
  Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting libclang>=9.0.1
  Using cached libclang-13.0.0-py2.py3-none-manylinux1_x86_64.whl (14.5 MB)
Collecting tf-estimator-nightly==2.8.0.dev2021122109
  Using cached tf_estimator_nightly-2.8.0.dev2021122109-py2.py3-none-any.whl (462 kB)
Collecting keras<2.9,>=2.8.0rc0
  Using cached keras-2.8.0-py2.py3-none-any.whl (1.4 MB)
Collecting wrapt>=1.11.0
  Using cached wrapt-1.14.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
Requirement already satisfied: six>=1.12.0 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from tensorflow>=2.3.2->bicleaner-ai==1.0.1) (1.16.0)
Collecting opt-einsum>=2.3.2
  Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting protobuf>=3.9.2
  Using cached protobuf-3.19.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
Collecting typing-extensions>=3.6.6
  Using cached typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Requirement already satisfied: setuptools in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from tensorflow>=2.3.2->bicleaner-ai==1.0.1) (61.2.0)
Collecting tensorboard<2.9,>=2.8
  Using cached tensorboard-2.8.0-py3-none-any.whl (5.8 MB)
Collecting grpcio<2.0,>=1.24.3
  Using cached grpcio-1.44.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
Collecting termcolor>=1.1.0
  Using cached termcolor-1.1.0-py3-none-any.whl
Collecting flatbuffers>=1.12
  Using cached flatbuffers-2.0-py2.py3-none-any.whl (26 kB)
Collecting google-pasta>=0.1.1
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting keras-preprocessing>=1.1.1
  Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting astunparse>=1.6.0
  Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting gast>=0.2.1
  Using cached gast-0.5.3-py3-none-any.whl (19 kB)
Collecting absl-py>=0.4.0
  Using cached absl_py-1.0.0-py3-none-any.whl (126 kB)
Collecting h5py>=2.9.0
  Using cached h5py-3.6.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Using cached tensorflow_io_gcs_filesystem-0.24.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.1 MB)
Collecting tomli>=1.0.0
  Using cached tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting py>=1.8.2
  Using cached py-1.11.0-py2.py3-none-any.whl (98 kB)
Collecting iniconfig
  Using cached iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
Collecting pluggy<2.0,>=0.12
  Using cached pluggy-1.0.0-py2.py3-none-any.whl (13 kB)
Collecting attrs>=19.2.0
  Using cached attrs-21.4.0-py2.py3-none-any.whl (60 kB)
Requirement already satisfied: click in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from sacremoses->bicleaner-ai==1.0.1) (8.0.4)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from astunparse>=1.6.0->tensorflow>=2.3.2->bicleaner-ai==1.0.1) (0.37.1)
Collecting pyparsing!=3.0.5,>=2.0.2
  Using cached pyparsing-3.0.7-py3-none-any.whl (98 kB)
Collecting markdown>=2.6.8
  Using cached Markdown-3.3.6-py3-none-any.whl (97 kB)
Collecting tensorboard-plugin-wit>=1.6.0
  Using cached tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Using cached tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Using cached google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting google-auth<3,>=1.6.3
  Using cached google_auth-2.6.2-py2.py3-none-any.whl (156 kB)
Collecting werkzeug>=0.11.15
  Using cached Werkzeug-2.0.3-py3-none-any.whl (289 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from requests->transformers==4.10.3->bicleaner-ai==1.0.1) (1.26.9)
Requirement already satisfied: certifi>=2017.4.17 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from requests->transformers==4.10.3->bicleaner-ai==1.0.1) (2021.10.8)
Collecting charset-normalizer~=2.0.0
  Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Requirement already satisfied: hunspell in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from fastspell->bicleaner-hardrules>=2.0->bicleaner-ai==1.0.1) (0.5.5)
Requirement already satisfied: pybind11>=2.2 in /home/cgarcia/miniconda3/envs/python/lib/python3.10/site-packages (from fasttext->bicleaner-hardrules>=2.0->bicleaner-ai==1.0.1) (2.9.1)
Collecting cachetools<6.0,>=2.0.0
  Using cached cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting rsa<5,>=3.1.4
  Using cached rsa-4.8-py3-none-any.whl (39 kB)
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting requests-oauthlib>=0.7.0
  Using cached requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting pyasn1<0.5.0,>=0.4.6
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting oauthlib>=3.0.0
  Using cached oauthlib-3.2.0-py3-none-any.whl (151 kB)
Building wheels for collected packages: bicleaner-ai, tokenizers
  Building wheel for bicleaner-ai (setup.py) ... done
  Created wheel for bicleaner-ai: filename=bicleaner_ai-1.0.1-py3-none-any.whl size=54794 sha256=3f1a9abb239729cabfb545ab511bae8f6138ef785db846c0d18902788427a7fe
  Stored in directory: /home/cgarcia/.cache/pip/wheels/99/de/a6/8387dd68f2bf77fc1db659727c6ced28aca8212a3a24b198ed
  Building wheel for tokenizers (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for tokenizers (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [51 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.10
      creating build/lib.linux-x86_64-3.10/tokenizers
      copying py_src/tokenizers/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers
      creating build/lib.linux-x86_64-3.10/tokenizers/models
      copying py_src/tokenizers/models/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/models
      creating build/lib.linux-x86_64-3.10/tokenizers/decoders
      copying py_src/tokenizers/decoders/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/decoders
      creating build/lib.linux-x86_64-3.10/tokenizers/normalizers
      copying py_src/tokenizers/normalizers/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/normalizers
      creating build/lib.linux-x86_64-3.10/tokenizers/pre_tokenizers
      copying py_src/tokenizers/pre_tokenizers/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/pre_tokenizers
      creating build/lib.linux-x86_64-3.10/tokenizers/processors
      copying py_src/tokenizers/processors/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/processors
      creating build/lib.linux-x86_64-3.10/tokenizers/trainers
      copying py_src/tokenizers/trainers/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/trainers
      creating build/lib.linux-x86_64-3.10/tokenizers/implementations
      copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
      copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
      copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
      copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
      copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
      copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
      copying py_src/tokenizers/implementations/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/implementations
      creating build/lib.linux-x86_64-3.10/tokenizers/tools
      copying py_src/tokenizers/tools/visualizer.py -> build/lib.linux-x86_64-3.10/tokenizers/tools
      copying py_src/tokenizers/tools/__init__.py -> build/lib.linux-x86_64-3.10/tokenizers/tools
      copying py_src/tokenizers/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers
      copying py_src/tokenizers/models/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/models
      copying py_src/tokenizers/decoders/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/decoders
      copying py_src/tokenizers/normalizers/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/normalizers
      copying py_src/tokenizers/pre_tokenizers/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/pre_tokenizers
      copying py_src/tokenizers/processors/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/processors
      copying py_src/tokenizers/trainers/__init__.pyi -> build/lib.linux-x86_64-3.10/tokenizers/trainers
      copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.linux-x86_64-3.10/tokenizers/tools
      running build_ext
      running build_rust
      error: can't find Rust compiler

      If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.

      To update pip, run:

          pip install --upgrade pip

      and then retry package installation.

      If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers
Successfully built bicleaner-ai
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
ZJaume commented 2 years ago

Have you tried to install a Rust compiler?

cgr71ii commented 2 years ago

I have tried it, but it is not only the Rust compiler dependency what is missing. The problem is related to huggingface tokenizers, which now is using Rust. I have needed to run the following commands to be able to build Bicleaner AI gensim branch (conda build, not environment, but it should be needed as well):

conda install -c conda-forge rust # rust compiler
pip install setuptools_rust

The dependency setuptools_rust can't be directly added to the requirements.txt file since it needs to be installed before execute setup.py in order to install the Bicleaner AI dependencies (just the same topic that the installation of setuptools, which is automatically installed in every python/conda environment).

If I'm not wrong, these dependencies should be added to the documentation.

ZJaume commented 2 years ago

This problem should be temporary, since current bicleaner-ai has HF Transformers version freezed at 4.10 (older than Python 3.10) an that's why precompiled tokenizers is missing. Once Transformers is updated it shouldn't appear.

cgr71ii commented 2 years ago

Ahhh, understood! Hadn't thought about that.