Wenhao-Jin / HydRA

A deep-learning model for predicting RNA-binding capacity from protein interaction association context and protein sequence
Other
6 stars 2 forks source link

ProteinBERT-RBP model not posted? #1

Closed umasstr closed 1 year ago

umasstr commented 1 year ago

Hi @Wenhao-Jin,

I am wondering if there is a model posted in release assets that I am not seeing? I cannot find "ProteinBERT_TrainWithWholeProteinSet_defaultSetting_ModelFile.pkl" in this or the proteinBERT repo.

If I am reading correctly, I can use your model without training my own?

Seemingly unrelated, it HydRa2_predict may be looking for a file outside of your environment. "/home/wjin/projects/RBP_pred/RBP_identification/Data/protVec_100d_3grams.csv"

HydRa2_predict --seq_dir sequences --proteinBERT_modelfile ModelFile.pkl --outdir out -n H01 --no-PIA --no-PPA

  File "/opt/conda/envs/HydRa/bin/HydRa2_predict", line 5, in <module>
    from HydRa.HydRa2_0_predict import call_main
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/HydRa/__init__.py", line 1, in <module>
    from .models import *
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/HydRa/models/__init__.py", line 2, in <module>
    from .Sequence_class import Protein_Sequence_Input5, Protein_Sequence_Input5_2, Protein_Sequence_Input5_noSS, Protein_Sequence_Input5_2_noSS
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/HydRa/models/Sequence_class.py", line 13, in <module>
    BioVec_weights=pd.read_table('/home/wjin/projects/RBP_pred/RBP_identification/Data/protVec_100d_3grams.csv', sep='\t', header=None, index_col=0)
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 777, in read_table
    return _read(filepath_or_buffer, kwds)
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 932, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1216, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/pandas/io/common.py", line 786, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/home/wjin/projects/RBP_pred/RBP_identification/Data/protVec_100d_3grams.csv'

Thanks!

Wenhao-Jin commented 1 year ago

Hey @umasstr, thank you for reporting these bugs. Appreciate it! (1) The model file for ProteinBERT-RBP component (ProteinBERT_TrainWithWholeProteinSet_defaultSetting_ModelFile.pkl.gz) can be downloaded from here. You may need to gunzip it before use. Yes, you can use this model without any training. It has already been trained with our training set composed of human RBPs and other proteins. (2) The FileNotFoundError has been fixed. You could try re-install HydRA package and re-run the prediction. python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps --upgrade hydra-rbp

Thank you! And let me know if it works on your side.

umasstr commented 1 year ago

Hi @Wenhao-Jin, thanks for the response. Unfortunately, this doesn't run either. I ran this in a container (umasstr/hydra:latest), installing hydra from scratch, so there shouldn't be any dependency issues.

(HydRa) root@809c3ccff36d:/DATA# HydRa2_predict --seq_dir sequences --proteinBERT_modelfile ProteinBERT_TrainWithWholeProteinSet_defaultSetting_ModelFile.pkl --outdir out -n H01 --no-PIA --no-PPA
2023-01-14 16:58:02.627641: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-01-14 16:58:02.628262: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "/opt/conda/envs/HydRa/bin/HydRa2_predict", line 5, in <module>
    from HydRa.HydRa2_0_predict import call_main
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/HydRa/__init__.py", line 1, in <module>
    from .models import *
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/HydRa/models/__init__.py", line 2, in <module>
    from .Sequence_class import Protein_Sequence_Input5, Protein_Sequence_Input5_2, Protein_Sequence_Input5_noSS, Protein_Sequence_Input5_2_noSS
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/HydRa/models/Sequence_class.py", line 727, in <module>
    class Protein_Sequence_Input5_bk:
  File "/opt/conda/envs/HydRa/lib/python3.8/site-packages/HydRa/models/Sequence_class.py", line 731, in Protein_Sequence_Input5_bk
    def __init__(self, files, class_labels, BioVec_name_dict=BioVec_name_dict, max_seqlen=1500):
NameError: name 'BioVec_name_dict' is not defined
Wenhao-Jin commented 1 year ago

Hi @umasstr, Sorry for the inconvenience. And thank you for reporting this! I just fixed this new error and will update the python package on PyPI/TestPyPI later today, so you can re-install it using the same command above: python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps --upgrade hydra-rbp . Alternatively, if you wanna try it now, you could go to your "/opt/conda/envs/HydRa/lib/python3.8/site-packages/HydRa/models/Sequence_class.py" file, and replace BioVec_name_dict=BioVec_name_dict to BioVec_name_dict on line 731. Please let me know if it still doesn't work on your side. Thank you!

Wenhao-Jin commented 1 year ago

Hi @umasstr, the package on TestPyPI is updated now. You can use the same command to upgrade HydRA: python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps --upgrade hydra-rbp. It should work now.

umasstr commented 1 year ago

Hey @Wenhao-Jin, I rebuilt the container and everything looks good! predict ran to completion on a small dataset. I'll give occlusion_map a try shortly.

I pushed umasstr/hydra:latest to docker. feel free to retag and post if this is useful to anyone. I forgot to activate the conda env in the build file (need to enter below command before use), but otherwise works well. conda activate HydRa

Thank you again for responding to the issue--you'll get your beer in the mail!

Wenhao-Jin commented 1 year ago

Hey @umasstr, thank you so much for reporting the bugs and making the docker image! Really appreciate it!!