amberT15 / LLM_eval

Code repository for study ''Evaluating the representational power of pre-trained DNA language models for regulatory genomics"
MIT License
17 stars 1 forks source link

Environment YAML File Fails to Build Using Conda (and naming confusion) #2

Open hongruhu opened 4 months ago

hongruhu commented 4 months ago

Hi Ziqi,

Thank you and your team for providing this valuable evaluation framework. However, I encountered some issues while trying to build the environment.

Here is one example, the GPN environment YAML file failed to build using conda (error details below). It seems something might be missing or it could be just the issue of the gpn. I assume the reason is that gpn was originally installed by using pip install git+https://github.com/songlab-cal/gpn.git.

conda env create -f gpn_requirments.yml 
Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: / Ran pip subprocess with arguments:
['/anaconda/envs/gpn_env/bin/python', '-m', 'pip', 'install', '-U', '-r', '/mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt', '--exists-action=b']
Pip subprocess output:
Collecting accelerate==0.19.0 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 1))
  Using cached accelerate-0.19.0-py3-none-any.whl (219 kB)
Collecting aiohttp==3.8.4 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 2))
  Using cached aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
Collecting aiosignal==1.3.1 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 3))
  Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting appdirs==1.4.4 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 4))
  Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting async-timeout==4.0.2 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 5))
  Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting attrs==23.1.0 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 6))
  Using cached attrs-23.1.0-py3-none-any.whl (61 kB)
Collecting bioframe==0.4.1 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 7))
  Using cached bioframe-0.4.1-py2.py3-none-any.whl (114 kB)
Collecting biopython==1.81 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 8))
  Using cached biopython-1.81-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Collecting certifi==2022.12.7 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 9))
  Using cached certifi-2022.12.7-py3-none-any.whl (155 kB)
Collecting charset-normalizer==2.1.1 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 10))
  Using cached charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting click==8.1.3 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 11))
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting cmake==3.25.0 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 12))
  Using cached cmake-3.25.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.7 MB)
Collecting contourpy==1.0.7 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 13))
  Using cached contourpy-1.0.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (299 kB)
Collecting cycler==0.11.0 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 14))
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting datasets==2.12.0 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 15))
  Using cached datasets-2.12.0-py3-none-any.whl (474 kB)
Collecting dill==0.3.6 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 16))
  Using cached dill-0.3.6-py3-none-any.whl (110 kB)
Collecting docker-pycreds==0.4.0 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 17))
  Using cached docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting einops==0.6.1 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 18))
  Using cached einops-0.6.1-py3-none-any.whl (42 kB)
Collecting filelock==3.9.0 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 19))
  Using cached filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting fonttools==4.39.4 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 20))
  Using cached fonttools-4.39.4-py3-none-any.whl (1.0 MB)
Collecting frozenlist==1.3.3 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 21))
  Using cached frozenlist-1.3.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)
Collecting fsspec==2023.5.0 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 22))
  Using cached fsspec-2023.5.0-py3-none-any.whl (160 kB)
Collecting gitdb==4.0.10 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 23))
  Using cached gitdb-4.0.10-py3-none-any.whl (62 kB)
Collecting gitpython==3.1.31 (from -r /mnt/batch/tasks/shared/LS_root/mounts/clusters/sc-ar4232131/code/Users/sc-ar423213/LLM_eval/condaenv.wmsqazw1.requirements.txt (line 24))
  Using cached GitPython-3.1.31-py3-none-any.whl (184 kB)

Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement gpn==0.2 (from versions: none)
ERROR: No matching distribution found for gpn==0.2

failed

CondaEnvException: Pip failed

after removing the gpn line it has such error

Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 1.14.0 Requires-Python >=3.10; 1.14.0rc1 Requires-Python >=3.10; 1.14.0rc2 Requires-Python >=3.10; 3.3 Requires-Python >=3.10; 3.3rc0 Requires-Python >=3.10
ERROR: Could not find a version that satisfies the requirement torch==2.0.1+cu118 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1)
ERROR: No matching distribution found for torch==2.0.1+cu118

Additionally, for the torch YAML file, the environment name is identical to the GPN environment, and I was wondering if these two are intended for the same purpose?

Could you please provide updated version of the YAML files in the near future or any additional guidance on how to resolve this issue?

Thank you very much for your assistance.

deysanjoy33 commented 1 month ago

Hi,

Thanks for providing this pipeline, but I am also getting error in installing the tf_requierments.yml which is required to run lentiMPRA/representation_perf.ipynb`. I get the following error from

ERROR: No matching distribution found for transformers==4.36.0.dev0

I tried to use a different version of transformers from the other yaml (torch_requirements.yml) for transformers, but it did not work either to run model = AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True).to('cuda') hidden_states = model(input_ids,attention_mask)[0].cpu().detach().numpy()

But I got following error.

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/d064dece8a8b41d9fb8729fbe3435278786931f1/bert_layers.py:609, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs) ... --> 781 assert q.is_cuda and k.is_cuda and v.is_cuda