RosettaCommons / RFdiffusion

Code for running RFdiffusion
Other
1.65k stars 319 forks source link

Issue when running the basic script #165

Open apinedalucena opened 9 months ago

apinedalucena commented 9 months ago

Hi,

I am fairly new to the Ubuntu world. I have installed RFdiffusion following all the steps indicated in the github repository for this package, and I did not get any errors.

However, when running:

./scripts/run_inference.py 'contigmap.contigs=[150-150]' inference.output_prefix=test_outputs/test inference.num_designs=10

I got the following error:

(SE3nv) apinedal@ubuntu:~/opt/RFdiffusion$ ./scripts/run_inference.py 'contigmap.contigs=[150-150]' inference.output_prefix=test_outputs/test inference.num_designs=10 DGL backend not selected or invalid. Assuming PyTorch for now. Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable. Valid options are: pytorch, mxnet, tensorflow (all lowercase) [2023-12-06 13:03:34,473][main][INFO] - //////////////////////////////////////////////// [2023-12-06 13:03:34,473][main][INFO] - ///// NO GPU DETECTED! Falling back to CPU ///// [2023-12-06 13:03:34,473][main][INFO] - //////////////////////////////////////////////// Reading models from /home/apinedal/opt/RFdiffusion/rfdiffusion/inference/../../models [2023-12-06 13:03:34,474][rfdiffusion.inference.model_runners][INFO] - Reading checkpoint from /home/apinedal/opt/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt This is inf_conf.ckpt_path /home/apinedal/opt/RFdiffusion/rfdiffusion/inference/../../models/Base_ckpt.pt Assembling -model, -diffuser and -preprocess configs from checkpoint USING MODEL CONFIG: self._conf[model][n_extra_block] = 4 USING MODEL CONFIG: self._conf[model][n_main_block] = 32 USING MODEL CONFIG: self._conf[model][n_ref_block] = 4 USING MODEL CONFIG: self._conf[model][d_msa] = 256 USING MODEL CONFIG: self._conf[model][d_msa_full] = 64 USING MODEL CONFIG: self._conf[model][d_pair] = 128 USING MODEL CONFIG: self._conf[model][d_templ] = 64 USING MODEL CONFIG: self._conf[model][n_head_msa] = 8 USING MODEL CONFIG: self._conf[model][n_head_pair] = 4 USING MODEL CONFIG: self._conf[model][n_head_templ] = 4 USING MODEL CONFIG: self._conf[model][d_hidden] = 32 USING MODEL CONFIG: self._conf[model][d_hidden_templ] = 32 USING MODEL CONFIG: self._conf[model][p_drop] = 0.15 USING MODEL CONFIG: self._conf[model][SE3_param_full] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 8, 'l0_out_features': 8, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 32} USING MODEL CONFIG: self._conf[model][SE3_param_topk] = {'num_layers': 1, 'num_channels': 32, 'num_degrees': 2, 'n_heads': 4, 'div': 4, 'l0_in_features': 64, 'l0_out_features': 64, 'l1_in_features': 3, 'l1_out_features': 2, 'num_edge_features': 64} USING MODEL CONFIG: self._conf[model][freeze_track_motif] = False USING MODEL CONFIG: self._conf[model][use_motif_timestep] = True USING MODEL CONFIG: self._conf[diffuser][T] = 50 USING MODEL CONFIG: self._conf[diffuser][b_0] = 0.01 USING MODEL CONFIG: self._conf[diffuser][b_T] = 0.07 USING MODEL CONFIG: self._conf[diffuser][schedule_type] = linear USING MODEL CONFIG: self._conf[diffuser][so3_type] = igso3 USING MODEL CONFIG: self._conf[diffuser][crd_scale] = 0.25 USING MODEL CONFIG: self._conf[diffuser][so3_schedule_type] = linear USING MODEL CONFIG: self._conf[diffuser][min_b] = 1.5 USING MODEL CONFIG: self._conf[diffuser][max_b] = 2.5 USING MODEL CONFIG: self._conf[diffuser][min_sigma] = 0.02 USING MODEL CONFIG: self._conf[diffuser][max_sigma] = 1.5 USING MODEL CONFIG: self._conf[preprocess][sidechain_input] = False USING MODEL CONFIG: self._conf[preprocess][motif_sidechain_input] = True USING MODEL CONFIG: self._conf[preprocess][d_t1d] = 22 USING MODEL CONFIG: self._conf[preprocess][d_t2d] = 44 USING MODEL CONFIG: self._conf[preprocess][prob_self_cond] = 0.5 USING MODEL CONFIG: self._conf[preprocess][str_self_cond] = True USING MODEL CONFIG: self._conf[preprocess][predict_previous] = False [2023-12-06 13:03:35,240][rfdiffusion.inference.model_runners][INFO] - Loading checkpoint. [2023-12-06 13:03:36,615][rfdiffusion.diffusion][INFO] - Calculating IGSO3. Successful diffuser init [2023-12-06 13:03:40,589][main][INFO] - Making design test_outputs/test_0 [2023-12-06 13:03:40,591][rfdiffusion.inference.model_runners][INFO] - Using contig: ['150-150'] With this beta schedule (linear schedule, beta_0 = 0.04, beta_T = 0.28), alpha_bar_T = 0.00013696048699785024 [2023-12-06 13:03:40,599][rfdiffusion.inference.model_runners][INFO] - Sequence init: ------------------------------------------------------------------------------------------------------------------------------------------------------ Error executing job with overrides: ['contigmap.contigs=[150-150]', 'inference.output_prefix=test_outputs/test', 'inference.num_designs=10'] Traceback (most recent call last): File "/home/apinedal/opt/RFdiffusion/./scripts/run_inference.py", line 94, in main px0, x_t, seq_t, plddt = sampler.sample_step( File "/home/apinedal/opt/RFdiffusion/rfdiffusion/inference/model_runners.py", line 664, in sample_step msa_prev, pair_prev, px0, state_prev, alpha, logits, plddt = self.model(msa_masked, File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/apinedal/opt/RFdiffusion/rfdiffusion/RoseTTAFoldModel.py", line 102, in forward msa, pair, R, T, alpha_s, state = self.simulator(seq, msa_latent, msa_full, pair, xyz[:,:,:3], File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/apinedal/opt/RFdiffusion/rfdiffusion/Track_module.py", line 420, in forward msa_full, pair, R_in, T_in, state, alpha = self.extra_block[i_m](msa_full, File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/apinedal/opt/RFdiffusion/rfdiffusion/Track_module.py", line 332, in forward R, T, state, alpha = self.str2str(msa, pair, R_in, T_in, xyz, state, idx, motif_mask=motif_mask, top_k=0) File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 141, in decorate_autocast return func(*args, kwargs) File "/home/apinedal/opt/RFdiffusion/rfdiffusion/Track_module.py", line 266, in forward shift = self.se3(G, node.reshape(BL, -1, 1), l1_feats, edge_feats) File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/apinedal/opt/RFdiffusion/rfdiffusion/SE3_network.py", line 83, in forward return self.se3(G, node_features, edge_features) File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/transformer.py", line 140, in forward basis = basis or get_basis(graph.edata['rel_pos'], max_degree=self.max_degree, compute_gradients=False, File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/basis.py", line 166, in get_basis with nvtx_range('spherical harmonics'): File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/contextlib.py", line 119, in enter return next(self.gen) File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 59, in range range_push(msg.format(*args, **kwargs)) File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 28, in range_push return _nvtx.rangePushA(msg) File "/home/apinedal/miniconda3/envs/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 9, in _fail raise RuntimeError("NVTX functions not installed. Are you sure you have a CUDA build?") RuntimeError: NVTX functions not installed. Are you sure you have a CUDA build?

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

...

Somehow, it does not detect I have a gpu, and also I get these other errors.

Please, I would appreciate any suggestions as to how to solve these issues.

Best,

Antonio

roccomoretti commented 9 months ago

I've seen this most often when for some reason the CPU version of PyTorch gets installed. You may need to re-install pytorch with conda, being sure to use the PyTorch channel. Something like conda install -c pytorch pytorch=1.9 may work.

apinedalucena commented 9 months ago

Thank you very much Rocco, it does work!

Best,

Antonio

LoloRoters commented 8 months ago

Hi! I had the same error. I did this conda install -c pytorch pytorch=1.9 , and now the calculations have started. However, there is such a message ' NO GPU DETECTED! Falling back to CPU' in the output. Is this okay? And my CPU is 100% loaded. :( I got a GPU

~nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
roccomoretti commented 8 months ago

The 'NO GPU DETECTED!' message can indeed happen when you've installed the CPU versions of pytorch, despite having a GPU available. -- You can confirm this with the command conda list -- look for the line for pytorch, and see if it specifies "cpu" or "cuda" in the third column with the package details. (It should also show which channel it was loaded from in the fourth column, if it was loaded from a different channel.)

I'm not sure why re-installing from the pytorch channel isn't working for you. Did you also install the proper cudatoolkit package? -- If you're still not seeing "cuda" in the conda list line, you may need to debug your pytorch conda installation, which likely would be better handled in a more pytorch-specific venue.

sherryliu987 commented 7 months ago

If you're struggling to install RFdiffusion locally, feel free to try https://www.tamarind.bio/rf-diffusion, a website which offers a no-code interface for bioinformatics tools including protein design with RFdiffusion for free.

tydingcw commented 3 months ago

I ran into this issue and was able to get the cuda version of pytorch only after uninstalling pytorch.

ccalia commented 2 months ago

I had the same problem. This fix seems to work:

I modified SE3nv.yml to specify the channel for pytorch and cudatoolkit:

name: SE3nv
channels:
  - defaults
  - conda-forge
  - pytorch
  - dglteam
  - nvidia
dependencies:
  - python=3.9
  - pytorch::pytorch=1.9
  - torchaudio
  - torchvision
  - nvidia::cudatoolkit=11.1
  - dgl-cuda11.1
  - pip
  - pip:
    - hydra-core
    - pyrsistent

Got the idea from: https://stackoverflow.com/questions/69180740/cant-install-gpu-enabled-pytorch-in-conda-environment-from-environment-yml