UKPLab / gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Apache License 2.0
324 stars 37 forks source link

Problem with tensorflow when installing GPL in python environment #1

Open Matthieu-Tinycoaching opened 2 years ago

Matthieu-Tinycoaching commented 2 years ago

Hi,

When creating a conda environment with python==3.8.8 and trying to install GPL within it using pip install gpl, the installation loops by collecting iteratively descending versions of tensorflow without end... :

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting gpl
  Using cached gpl-0.0.9-py3-none-any.whl (24 kB)
Collecting beir
  Using cached beir-0.2.3.tar.gz (52 kB)
Collecting easy-elasticsearch>=0.0.7
  Using cached easy_elasticsearch-0.0.7-py3-none-any.whl (12 kB)
Collecting elasticsearch==7.12.1
  Using cached elasticsearch-7.12.1-py2.py3-none-any.whl (339 kB)
Collecting requests
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 1.5 MB/s 
Collecting tqdm
  Using cached tqdm-4.62.3-py2.py3-none-any.whl (76 kB)
Requirement already satisfied: certifi in ./anaconda3/envs/gpl_fresh/lib/python3.8/site-packages (from elasticsearch==7.12.1->easy-elasticsearch>=0.0.7->gpl) (2021.10.8)
Collecting urllib3<2,>=1.21.1
  Downloading urllib3-1.26.8-py2.py3-none-any.whl (138 kB)
     |████████████████████████████████| 138 kB 12.7 MB/s 
Collecting sentence-transformers
  Using cached sentence_transformers-2.1.0-py3-none-any.whl
Collecting pytrec_eval
  Using cached pytrec_eval-0.5.tar.gz (15 kB)
Collecting faiss_cpu
  Using cached faiss_cpu-1.7.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.6 MB)
Collecting tensorflow>=2.2.0
  Using cached tensorflow-2.8.0-cp38-cp38-manylinux2010_x86_64.whl (497.6 MB)
Collecting tensorflow-text
  Using cached tensorflow_text-2.7.3-cp38-cp38-manylinux2010_x86_64.whl (4.9 MB)
Collecting tensorflow-hub
  Using cached tensorflow_hub-0.12.0-py2.py3-none-any.whl (108 kB)
Requirement already satisfied: setuptools in ./anaconda3/envs/gpl_fresh/lib/python3.8/site-packages (from tensorflow>=2.2.0->beir->gpl) (58.0.4)
Collecting grpcio<2.0,>=1.24.3
  Downloading grpcio-1.43.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
     |████████████████████████████████| 4.1 MB 2.2 MB/s 
Collecting typing-extensions>=3.6.6
  Downloading typing_extensions-4.0.1-py3-none-any.whl (22 kB)
Collecting keras-preprocessing>=1.1.1
  Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting wrapt>=1.11.0
  Downloading wrapt-1.13.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (84 kB)
     |████████████████████████████████| 84 kB 9.6 MB/s 
Collecting tf-estimator-nightly==2.8.0.dev2021122109
  Using cached tf_estimator_nightly-2.8.0.dev2021122109-py2.py3-none-any.whl (462 kB)
Collecting tensorboard<2.9,>=2.8
  Using cached tensorboard-2.8.0-py3-none-any.whl (5.8 MB)
Collecting google-pasta>=0.1.1
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting six>=1.12.0
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting absl-py>=0.4.0
  Using cached absl_py-1.0.0-py3-none-any.whl (126 kB)
Collecting opt-einsum>=2.3.2
  Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting protobuf>=3.9.2
  Downloading protobuf-3.19.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 25.7 MB/s 
Collecting libclang>=9.0.1
  Using cached libclang-13.0.0-py2.py3-none-manylinux1_x86_64.whl (14.5 MB)
Collecting keras<2.9,>=2.8.0rc0
  Using cached keras-2.8.0-py2.py3-none-any.whl (1.4 MB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Using cached tensorflow_io_gcs_filesystem-0.23.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.1 MB)
Collecting termcolor>=1.1.0
  Using cached termcolor-1.1.0-py3-none-any.whl
Collecting h5py>=2.9.0
  Using cached h5py-3.6.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
Requirement already satisfied: numpy>=1.20 in ./anaconda3/envs/gpl_fresh/lib/python3.8/site-packages (from tensorflow>=2.2.0->beir->gpl) (1.21.2)
Collecting astunparse>=1.6.0
  Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting gast>=0.2.1
  Using cached gast-0.5.3-py3-none-any.whl (19 kB)
Collecting flatbuffers>=1.12
  Using cached flatbuffers-2.0-py2.py3-none-any.whl (26 kB)
Requirement already satisfied: wheel<1.0,>=0.23.0 in ./anaconda3/envs/gpl_fresh/lib/python3.8/site-packages (from astunparse>=1.6.0->tensorflow>=2.2.0->beir->gpl) (0.37.1)
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting google-auth<3,>=1.6.3
  Downloading google_auth-2.6.0-py2.py3-none-any.whl (156 kB)
     |████████████████████████████████| 156 kB 17.3 MB/s 
Collecting tensorboard-plugin-wit>=1.6.0
  Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
     |████████████████████████████████| 781 kB 27.3 MB/s 
Collecting werkzeug>=0.11.15
  Using cached Werkzeug-2.0.2-py3-none-any.whl (288 kB)
Collecting markdown>=2.6.8
  Downloading Markdown-3.3.6-py3-none-any.whl (97 kB)
     |████████████████████████████████| 97 kB 3.8 MB/s 
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Using cached tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
Collecting rsa<5,>=3.1.4
  Downloading rsa-4.8-py3-none-any.whl (39 kB)
Collecting pyasn1-modules>=0.2.1
  Using cached pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
Collecting cachetools<6.0,>=2.0.0
  Downloading cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting requests-oauthlib>=0.7.0
  Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting importlib-metadata>=4.4
  Downloading importlib_metadata-4.10.1-py3-none-any.whl (17 kB)
Collecting zipp>=0.5
  Downloading zipp-3.7.0-py3-none-any.whl (5.3 kB)
Collecting pyasn1<0.5.0,>=0.4.6
  Using cached pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
Collecting charset-normalizer~=2.0.0
  Downloading charset_normalizer-2.0.11-py3-none-any.whl (39 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting oauthlib>=3.0.0
  Downloading oauthlib-3.2.0-py3-none-any.whl (151 kB)
     |████████████████████████████████| 151 kB 26.8 MB/s 
Collecting scipy
  Using cached scipy-1.7.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (39.3 MB)
Collecting scikit-learn
  Downloading scikit_learn-1.0.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
     |████████████████████████████████| 26.7 MB 2.9 MB/s 
Collecting torchvision
  Using cached torchvision-0.11.3-cp38-cp38-manylinux1_x86_64.whl (23.2 MB)
Collecting nltk
  Downloading nltk-3.6.7-py3-none-any.whl (1.5 MB)
     |████████████████████████████████| 1.5 MB 7.0 MB/s 
Collecting tokenizers>=0.10.3
  Downloading tokenizers-0.11.4-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB)
     |████████████████████████████████| 6.8 MB 2.5 MB/s 
Collecting huggingface-hub
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
     |████████████████████████████████| 67 kB 2.9 MB/s 
Collecting torch>=1.6.0
  Downloading torch-1.10.2-cp38-cp38-manylinux1_x86_64.whl (881.9 MB)
     |████████████████████████████████| 881.9 MB 5.9 kB/s 
Collecting sentencepiece
  Downloading sentencepiece-0.1.96-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
     |████████████████████████████████| 1.2 MB 5.4 MB/s 
Collecting transformers<5.0.0,>=4.6.0
  Downloading transformers-4.16.2-py3-none-any.whl (3.5 MB)
     |████████████████████████████████| 3.5 MB 2.2 MB/s 
Collecting filelock
  Downloading filelock-3.4.2-py3-none-any.whl (9.9 kB)
Collecting sacremoses
  Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)
     |████████████████████████████████| 895 kB 9.6 MB/s 
Requirement already satisfied: pyyaml>=5.1 in ./.local/lib/python3.8/site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers->beir->gpl) (5.4.1)
Collecting packaging>=20.0
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
     |████████████████████████████████| 40 kB 2.4 MB/s 
Collecting regex!=2019.12.17
  Downloading regex-2022.1.18-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (764 kB)
     |████████████████████████████████| 764 kB 26.7 MB/s 
Collecting pyparsing!=3.0.5,>=2.0.2
  Downloading pyparsing-3.0.7-py3-none-any.whl (98 kB)
     |████████████████████████████████| 98 kB 4.1 MB/s 
Collecting joblib
  Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting click
  Using cached click-8.0.3-py3-none-any.whl (97 kB)
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting tensorflow>=2.2.0
  Using cached tensorflow-2.7.1-cp38-cp38-manylinux2010_x86_64.whl (495.1 MB)
Collecting gast>=0.2.1
  Using cached gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting tensorflow>=2.2.0
  Using cached tensorflow-2.7.0-cp38-cp38-manylinux2010_x86_64.whl (489.6 MB)
INFO: pip is looking at multiple versions of tensorflow-text to determine which version is compatible with other requirements. This could take a while.
Collecting tensorflow-text
  Using cached tensorflow_text-2.7.0-cp38-cp38-manylinux2010_x86_64.whl (4.9 MB)
  Using cached tensorflow_text-2.6.0-cp38-cp38-manylinux1_x86_64.whl (4.4 MB)
Collecting tensorflow>=2.2.0
  Using cached tensorflow-2.6.3-cp38-cp38-manylinux2010_x86_64.whl (463.9 MB)
Collecting six>=1.12.0
  Using cached six-1.15.0-py2.py3-none-any.whl (10 kB)
Collecting h5py>=2.9.0
  Downloading h5py-3.1.0-cp38-cp38-manylinux1_x86_64.whl (4.4 MB)
     |████████████████████████████████| 4.4 MB 2.4 MB/s 
Collecting tensorflow>=2.2.0
  Using cached tensorflow-2.6.2-cp38-cp38-manylinux2010_x86_64.whl (458.4 MB)
  Using cached tensorflow-2.6.1-cp38-cp38-manylinux2010_x86_64.whl (458.4 MB)
  Using cached tensorflow-2.6.0-cp38-cp38-manylinux2010_x86_64.whl (458.4 MB)
Collecting tensorflow-estimator~=2.6
  Using cached tensorflow_estimator-2.8.0-py2.py3-none-any.whl (462 kB)
Collecting tensorflow-text
  Using cached tensorflow_text-2.5.0-cp38-cp38-manylinux1_x86_64.whl (4.3 MB)
Collecting tensorflow>=2.2.0
  Using cached tensorflow-2.5.3-cp38-cp38-manylinux2010_x86_64.whl (460.4 MB)
  Downloading tensorflow-2.5.2-cp38-cp38-manylinux2010_x86_64.whl (454.5 MB)
     |████████████████████████████████| 454.5 MB 24 kB/s 
  Downloading tensorflow-2.5.1-cp38-cp38-manylinux2010_x86_64.whl (454.5 MB)
     |▍                               | 6.0 MB 266 kB/s eta 0:28:03^C
ERROR: Operation cancelled by user

Is there a way to fix this no-end tensorflow installation and is it possible to install GPU versions of pytorch and tensorflow?

kwang2049 commented 2 years ago

Hi @Matthieu-Tinycoaching, sorry for the delay.

This issue is related to the dependencies of BeIR. Here I invite @NThakur20 to answer your question.

But as a quick remedy, I think one can first install tensorflow manually and then do the pip install gpl to bypass the tensorflow installaton.

thakur-nandan commented 2 years ago

Hi @Matthieu-Tinycoaching,

The issue is occurring while installing TensorFlow for beir. In my next master and pip release of beir, soon the dependency on TF will go away. If you check already here in the development branch: (https://github.com/UKPLab/beir/blob/development/setup.py), it is listed as an optional requirement.

I would suggest what @kwang2049 suggests to do.

Kind Regards, Nandan Thakur

Matthieu-Tinycoaching commented 2 years ago

Thanks @kwang2049 @NThakur20 this worked indeed.

However, I would like to benefit from my personal GPU card on local computer and it seems that pip install gpl didn't install GPU versions of torch (v.1.10.2.) and tensorflow (v.2.8.0.) working with my CUDA version which is v.11.0. Since installing CUDA drivers is very painful and time-consuming, would it be possible while installing GPL to install GPU versions of both frameworks working with CUDA v.11.0.?

Thanks!

behrica commented 2 years ago

I just confirm that a GPU has a huge impact on training speed, factor 30 in my experiments and my single GPU (Tesla P40 24GB) I have a Dockerfile which setups cuda and all needed for GPU / gpl to work.

I could contribute it here, if usefull

kwang2049 commented 2 years ago

Thanks for reporting this issue. I found BeIR has just excluded tensorflow in requirements. This means in theory gpl could also work without it. I will dig into this issue and make it tensorflow-free:).

junebug-junie commented 2 years ago

I went through the entire TF mess (12 hours!) and was somehow able to avoid it with the following env on my new M1 :)

name: finetune_hs
channels:
  - defaults
dependencies:
  - ca-certificates=2022.4.26=hca03da5_0
  - certifi=2022.6.15=py39hca03da5_0
  - libcxx=12.0.0=hf6beb65_1
  - libffi=3.4.2=hc377ac9_4
  - ncurses=6.3=h1a28f6b_2
  - openssl=1.1.1o=h1a28f6b_0
  - pip=22.1.2=py39hca03da5_0
  - python=3.9.12=hbdb9e5c_1
  - readline=8.1.2=h1a28f6b_1
  - sqlite=3.38.5=h1058600_0
  - tk=8.6.12=hb8d0fd4_0
  - tzdata=2022a=hda174b7_0
  - wheel=0.37.1=pyhd3eb1b0_0
  - xz=5.2.5=h1a28f6b_1
  - zlib=1.2.12=h5a0b063_2
  - pip:
    - anyio==3.6.1
    - appnope==0.1.3
    - argon2-cffi==21.3.0
    - argon2-cffi-bindings==21.2.0
    - asttokens==2.0.5
    - attrs==21.4.0
    - babel==2.10.3
    - backcall==0.2.0
    - beautifulsoup4==4.11.1
    - beir==1.0.0
    - bleach==5.0.1
    - cffi==1.15.0
    - charset-normalizer==2.0.12
    - click==8.1.3
    - debugpy==1.6.0
    - decorator==5.1.1
    - defusedxml==0.7.1
    - e==1.4.5
    - easy-elasticsearch==0.0.7
    - easyprocess==1.1
    - elasticsearch==7.9.1
    - entrypoint2==1.1
    - entrypoints==0.4
    - executing==0.8.3
    - faiss-cpu==1.7.2
    - fastjsonschema==2.15.3
    - filelock==3.7.1
    - gpl==0.1.1
    - huggingface-hub==0.8.1
    - idna==3.3
    - importlib-metadata==4.12.0
    - iprogress==0.4
    - ipykernel==6.15.0
    - ipython==8.4.0
    - ipython-genutils==0.2.0
    - jedi==0.18.1
    - jinja2==3.1.2
    - joblib==1.1.0
    - json5==0.9.8
    - jsonschema==4.6.0
    - jupyter-client==7.3.4
    - jupyter-core==4.10.0
    - jupyter-server==1.18.0
    - jupyterlab==3.4.3
    - jupyterlab-pygments==0.2.2
    - jupyterlab-server==2.14.0
    - markupsafe==2.1.1
    - matplotlib-inline==0.1.3
    - mistune==0.8.4
    - nbclassic==0.3.7
    - nbclient==0.6.4
    - nbconvert==6.5.0
    - nbformat==5.4.0
    - nest-asyncio==1.5.5
    - nltk==3.7
    - notebook==6.4.12
    - notebook-shim==0.1.0
    - numpy==1.23.0
    - packaging==21.3
    - pandas==1.4.3
    - pandocfilters==1.5.0
    - parso==0.8.3
    - pexpect==4.8.0
    - pickleshare==0.7.5
    - pillow==9.1.1
    - prometheus-client==0.14.1
    - prompt-toolkit==3.0.30
    - protobuf==3.20.0
    - psutil==5.9.1
    - ptyprocess==0.7.0
    - pure-eval==0.2.2
    - pycparser==2.21
    - pygments==2.12.0
    - pyparsing==3.0.9
    - pyrsistent==0.18.1
    - python-dateutil==2.8.2
    - pytrec-eval==0.5
    - pytz==2022.1
    - pyyaml==6.0
    - pyzmq==23.2.0
    - rarfile==4.0
    - regex==2022.6.2
    - requests==2.28.0
    - scikit-learn==1.1.1
    - scipy==1.8.1
    - send2trash==1.8.0
    - sentence-transformers==2.2.2
    - sentencepiece==0.1.97
    - setuptools==62.6.0
    - six==1.16.0
    - sniffio==1.2.0
    - soupsieve==2.3.2.post1
    - stack-data==0.3.0
    - terminado==0.15.0
    - threadpoolctl==3.1.0
    - tinycss2==1.1.1
    - tokenizers==0.12.1
    - torch==1.11.0
    - torchvision==0.12.0
    - tornado==6.1
    - tqdm==4.64.0
    - traitlets==5.3.0
    - transformers==4.20.1
    - typing-extensions==4.2.0
    - urllib3==1.26.9
    - wcwidth==0.2.5
    - webencodings==0.5.1
    - websocket-client==1.3.3
    - zipp==3.8.0
prefix: /opt/homebrew/Caskroom/miniconda/base/envs/finetune_hs