Open tmsincomb opened 1 year ago
I also ran across this issue and your solution seems to works (at least on the examples I've tried). Thanks!
I got it too, and for me this worked:
conda (or mamba) update --all -c pytorch
OS: Fedora36 GPU: gtx 1080Ti
I have CUDA 11.8, but your solution worked after I modified the SE3nv.yml
to have:
- cudatoolkit=11.7
- dgl-cuda11.7
Note, I had to accept one version lower on the installed toolkit because there is currently no dgl-cuda11.8
The solution: $ pip3 install --force-reinstall torch torchvision torchaudio worked for me too on RTX4090 on Ubuntu 22.04 I was getting slightly different error though:
Traceback (most recent call last): File "/big18TB/apps/RF/RFdiffusion/./scripts/run_inference.py", line 94, in main px0, x_t, seq_t, plddt = sampler.sample_step( File "/big18TB/apps/RF/RFdiffusion/rfdiffusion/inference/model_runners.py", line 664, in sample_step msa_prev, pair_prev, px0, state_prev, alpha, logits, plddt = self.model(msa_masked, File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/big18TB/apps/RF/RFdiffusion/rfdiffusion/RoseTTAFoldModel.py", line 102, in forward msa, pair, R, T, alpha_s, state = self.simulator(seq, msa_latent, msa_full, pair, xyz[:,:,:3], File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/big18TB/apps/RF/RFdiffusion/rfdiffusion/Track_module.py", line 420, in forward msa_full, pair, R_in, T_in, state, alpha = self.extra_block[i_m](msa_full, File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/big18TB/apps/RF/RFdiffusion/rfdiffusion/Track_module.py", line 332, in forward R, T, state, alpha = self.str2str(msa, pair, R_in, T_in, xyz, state, idx, motif_mask=motif_mask, top_k=0) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 141, in decorate_autocast return func(*args, kwargs) File "/big18TB/apps/RF/RFdiffusion/rfdiffusion/Track_module.py", line 266, in forward shift = self.se3(G, node.reshape(BL, -1, 1), l1_feats, edge_feats) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/big18TB/apps/RF/RFdiffusion/rfdiffusion/SE3_network.py", line 83, in forward return self.se3(G, node_features, edge_features) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/transformer.py", line 140, in forward basis = basis or get_basis(graph.edata['rel_pos'], max_degree=self.max_degree, compute_gradients=False, File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/basis.py", line 167, in get_basis spherical_harmonics = get_spherical_harmonics(relative_pos, max_degree) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/se3_transformer-1.0.0-py3.9.egg/se3_transformer/model/basis.py", line 58, in get_spherical_harmonics sh = o3.spherical_harmonics(all_degrees, relative_pos, normalize=True) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/e3nn/o3/_spherical_harmonics.py", line 180, in spherical_harmonics return sh(x) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/bulat/anaconda3/envs/SE3nv/lib/python3.9/site-packages/e3nn/o3/_spherical_harmonics.py", line 82, in forward sh = _spherical_harmonics(self._lmax, x[..., 0], x[..., 1], x[..., 2]) RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)
nvrtc compilation failed:
template
template
extern "C" global void fused_pow_pow_pow_su_9196483836509741110(float tz_1, float ty_1, float tx_1, float aten_mul, float aten_mul_1, float aten_mul_2, float aten_sub, float aten_add, float aten_mul_3, float aten_pow) { { if (512 blockIdx.x + threadIdx.x<22350 ? 1 : 0) { float ty_1_1 = __ldg(ty_1 + 3 (512 blockIdx.x + threadIdx.x)); aten_pow[512 blockIdx.x + threadIdx.x] = ty_1_1 ty_1_1; float tz_1_1 = __ldg(tz_1 + 3 (512 blockIdx.x + threadIdx.x)); float tx_1_1 = __ldg(tx_1 + 3 (512 blockIdx.x + threadIdx.x)); aten_mul_3[512 blockIdx.x + threadIdx.x] = (float)((double)(tz_1_1 tz_1_1 - tx_1_1 tx_1_1) 0.8660254037844386); aten_add[512 blockIdx.x + threadIdx.x] = tx_1_1 tx_1_1 + tz_1_1 tz_1_1; aten_sub[512 blockIdx.x + threadIdx.x] = ty_1_1 ty_1_1 - (float)((double)(tx_1_1 tx_1_1 + tz_1_1 tz_1_1) 0.5); aten_mul_2[512 blockIdx.x + threadIdx.x] = (float)((double)(ty_1_1) 1.732050807568877) tz_1_1; aten_mul_1[512 blockIdx.x + threadIdx.x] = (float)((double)(tx_1_1) 1.732050807568877) ty_1_1; aten_mul[512 blockIdx.x + threadIdx.x] = (float)((double)(tx_1_1) 1.732050807568877) tz_1_1; } } }
I also come across with the problem. And it is because pytorch is cpu-version, so run conda install -c pytorch pytorch
can solve the problem.
Device
OS: CentOS Linux 7 GPU: gtx 1080
Issue
Hi! I get the following error running any of the examples scripts
When using the current SE3nv.yml I get the following versions
Solution
I did a clean install running
pip3 install --force-reinstall torch torchvision torchaudio
That seems to run every example without an issue. I've come into issues before with conda installs for pytorch when not using the most recent version. Is there a known issue from keeping RFdiffusion from moving to pytorch 2.0?