KosinskiLab / AlphaPulldown

https://doi.org/10.1093/bioinformatics/btac749
GNU General Public License v3.0
199 stars 46 forks source link

GPU seems not working when run_multimer_jobs.py #358

Closed HUST6324 closed 3 months ago

HUST6324 commented 3 months ago

Hello! I am trying to run the run_multimer_jobs script but there are some warnings.It seems that it is running on CPU ,because when i checked with nvidia-smi , my GPU RAM wasn't consumed .So what's wrong with my settings? My Cuda version is 11.8,cudnn == 8.6,TensorRT==8.5.3.1

Output log below: /home/spuser/.conda/envs/AlphaPulldown/lib/python3.10/site-packages/Bio/Data/SCOPData.py:18: BiopythonDeprecationWarning: The 'Bio.Data.SCOPData' module will be deprecated in a future release of Biopython in favor of 'Bio.Data.PDBData. warnings.warn( 2024-06-06 21:50:57.537469: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-06-06 21:50:58.126378: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... I0606 21:50:58.126525 139679499224896 utils.py:271] checking if output_dir exists /data/LuLab/Sorl_Pulldown I0606 21:50:58.127097 139679499224896 run_multimer_jobs.py:229] All pickle files have been found I0606 21:51:00.748160 139679499224896 run_multimer_jobs.py:236] done creating multimer SORL1_MOUSE_and_Scl_GABA_transporter_3_Mouse I0606 21:51:00.748390 139679499224896 run_multimer_jobs.py:387] object: SORL1_MOUSE_and_Scl_GABA_transporter_3_Mouse I0606 21:51:00.748438 139679499224896 run_multimer_jobs.py:389] Modeling new interaction for /data/LuLab/Sorl_Pulldown/SORL1_MOUSE_and_Scl_GABA_transporter_3_Mouse I0606 21:51:01.177516 139679499224896 xla_bridge.py:660] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA I0606 21:51:01.178307 139679499224896 xla_bridge.py:660] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory I0606 21:51:03.512243 139679499224896 utils.py:378] Model model_1_multimer_v3 is running 0 prediction with default MSA depth I0606 21:51:04.067755 139679499224896 utils.py:378] Model model_2_multimer_v3 is running 0 prediction with default MSA depth I0606 21:51:04.616162 139679499224896 utils.py:378] Model model_3_multimer_v3 is running 0 prediction with default MSA depth I0606 21:51:05.157576 139679499224896 utils.py:378] Model model_4_multimer_v3 is running 0 prediction with default MSA depth I0606 21:51:05.698115 139679499224896 utils.py:378] Model model_5_multimer_v3 is running 0 prediction with default MSA depth I0606 21:51:05.698261 139679499224896 utils.py:384] Using random seed 933239592531623975 for the data pipeline I0606 21:51:05.698946 139679499224896 run_multimer_jobs.py:323] now running prediction on SORL1_MOUSE_and_Scl_GABA_transporter_3_Mouse I0606 21:51:05.699002 139679499224896 run_multimer_jobs.py:324] output_path is /data/LuLab/Sorl_Pulldown/SORL1_MOUSE_and_Scl_GABA_transporter_3_Mouse I0606 21:51:05.699060 139679499224896 predict_structure.py:125] Checking for existing results I0606 21:51:05.699198 139679499224896 predict_structure.py:139] Running model model_1_multimer_v3_pred_0 on SORL1_MOUSE_and_Scl_GABA_transporter_3_Mouse I0606 21:51:05.699748 139679499224896 model.py:165] Running predict with shape(feat) = {'aatype': (2842,), 'residue_index': (2842,), 'seq_length': (), 'msa': (4095, 2842), 'num_alignments': (), 'template_aatype': (4, 2842), 'template_all_atom_mask': (4, 2842, 37), 'template_all_atom_positions': (4, 2842, 37, 3), 'asym_id': (2842,), 'sym_id': (2842,), 'entity_id': (2842,), 'deletion_matrix': (4095, 2842), 'deletion_mean': (2842,), 'all_atom_mask': (2842, 37), 'all_atom_positions': (2842, 37, 3), 'assembly_num_chains': (), 'entity_mask': (2842,), 'num_templates': (), 'cluster_bias_mask': (4095,), 'bert_mask': (4095, 2842), 'seq_mask': (2842,), 'msa_mask': (4095, 2842)}

dingquanyu commented 3 months ago

Hi @HUST6324 ,

Sorry that this error occurred. AlphaPulldown uses TensorFlow 2.14.0, which, according to TensorFlow's website, needs CUDA 11.8 and cuDNN 8.7. Source: https://www.tensorflow.org/install/source#tested_build_configurations However, I am not 100% sure if using cuDNN 8.6 caused this issue of yours. If you have the permission to install cuDNN yourself, could you try installing cuDNN 8.7?

In addition, in your current conda environment, could you run the following:

import tensorflow
tensorflow.config.list_physical_devices('GPU')

and let me know whether it return you with an empty list.

Yours Dingquan

HUST6324 commented 3 months ago

Thanks for reply! @dingquanyu I found that Tensorflow wasn't installed properly when i checked with codes above.My bad!!As I reinstalled TensorFlow 2.14.0,run_multimer_jobs.py worked properly.Thanks a lot!!

dingquanyu commented 3 months ago

Thanks for reply! @dingquanyu I found that Tensorflow wasn't installed properly when i checked with codes above.My bad!!As I reinstalled TensorFlow 2.14.0,run_multimer_jobs.py worked properly.Thanks a lot!!

@HUST6324 thank you for the update! Glad it is solved. Could you pls send the re-installation command here as well? I think @polya18 had the same issue in #339 and I may need to update the installation instruction in the README.

Yours Dingquan

HUST6324 commented 3 months ago

@dingquanyu I reinstall tensorflow with the following:

conda install tensorflow-gpu==2.14.0 -c conda-forge

dingquanyu commented 3 months ago

@dingquanyu I reinstall tensorflow with the following:

conda install tensorflow-gpu==2.14.0 -c conda-forge

Hi @polya18 could you use this command and reinstall your tensorflow? Hope it will help.