google-deepmind / alphafold

Open source code for AlphaFold 2.
Apache License 2.0
12.94k stars 2.29k forks source link

Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) #692

Closed MarkusRainerSchmidt closed 1 year ago

MarkusRainerSchmidt commented 1 year ago

Hi,

We are trying to install alphafold on a cluster, where we do not have sudo privileges nor are we allowed to use docker. Hence, we are following the setup that can be found here: https://github.com/kalininalab/alphafold_non_docker

Alphafold runs until it reaches the Restraining step, where it can't find a compatible CUDA device. This happens even though the GPU was found (and used (?) ) in an earlier step.

Our GPU is an NVIDIA RTX A5000 with computeCapability: 8.6. We tried both with CUDA 11.3 and 12.0.

55 This issue might suggest that we need CUDA 11.1 instead.

However, the error they observe differs from ours.

Even if this is not a standard installation, do you know if the combination of CUDA 11.3/12.0 with an NVIDIA RTX A5000 GPU causes an issue with alphafold? Do you think this can be solved by installing CUDA 11.1? Would a docker installation solve this?

Here is the error log:

2023-01-28 12:26:29.226660: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
I0128 12:26:33.817948 47964907163328 templates.py:857] Using precomputed obsolete pdbs /work/project/ladlad_008/databases/pdb_mmcif/obsolete.dat.
2023-01-28 12:26:34.179997: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:171] XLA service 0x56228003c700 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-01-28 12:26:34.180241: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:179]   StreamExecutor device (0): NVIDIA RTX A5000, Compute Capability 8.6
2023-01-28 12:26:34.180682: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/gpu_device.cc:301] Using BFC allocator.
2023-01-28 12:26:34.180832: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/gpu_device.cc:260] XLA backend allocating 101702434816 bytes on device 0 for BFCAllocator.
2023-01-28 12:26:34.191034: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:160] TfrtCpuClient created.
I0128 12:26:39.254686 47964907163328 run_alphafold.py:376] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0128 12:26:39.255098 47964907163328 run_alphafold.py:393] Using random seed 660002192156415805 for the data pipeline
I0128 12:26:39.255512 47964907163328 run_alphafold.py:161] Predicting 1433
I0128 12:26:39.256622 47964907163328 jackhmmer.py:133] Launching subprocess "/home/mheimhalt/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp6pzs9qwq/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /work/project/ladlad_008/sec_inst/fasta_in/1433.fasta /work/project/ladlad_008/databases/uniref90/uniref90.fasta"
I0128 12:26:39.272459 47964907163328 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0128 12:31:01.699572 47964907163328 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 262.427 seconds
I0128 12:31:01.831821 47964907163328 jackhmmer.py:133] Launching subprocess "/home/mheimhalt/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpg67ao65l/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /work/project/ladlad_008/sec_inst/fasta_in/1433.fasta /work/project/ladlad_008/databases/mgnify/mgy_clusters_2022_05.fa"
I0128 12:31:01.847216 47964907163328 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query
I0128 12:40:03.972252 47964907163328 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 542.125 seconds
I0128 12:40:05.249914 47964907163328 hhsearch.py:85] Launching subprocess "/home/mheimhalt/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpqpxusk99/query.a3m -o /tmp/tmpqpxusk99/output.hhr -maxseq 1000000 -d /work/project/ladlad_008/databases/pdb70/pdb70"
I0128 12:40:05.267063 47964907163328 utils.py:36] Started HHsearch query
I0128 12:40:35.346482 47964907163328 utils.py:40] Finished HHsearch query in 30.079 seconds
I0128 12:40:35.994158 47964907163328 hhblits.py:128] Launching subprocess "/home/mheimhalt/.conda/envs/alphafold/bin/hhblits -i /work/project/ladlad_008/sec_inst/fasta_in/1433.fasta -cpu 4 -oa3m /tmp/tmp33kj0het/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /work/project/ladlad_008/databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /work/project/ladlad_008/sec_inst/databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0128 12:40:36.010200 47964907163328 utils.py:36] Started HHblits query
I0128 12:45:47.980059 47964907163328 utils.py:40] Finished HHblits query in 311.969 seconds
I0128 12:45:48.017507 47964907163328 templates.py:878] Searching for template for: MEKTELIQKAKLAEQAERYDDMATCMKAVTEQGAELSNEERNLLSVAYKNVVGGRRSAWRVISSIEQKTDTSDKKLQLIKDYREKVESELRSICTTVLELLDKYLIANATNPESKVFYLKMKGDYFRYLAEVACGDDRKQTIDNSQGAYQEAFDISKKEMQPTHPIRLGLALNFSVFYYEILNNPELACTLAKTAFDEAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDSAGEECDAAEGAEN
I0128 12:45:48.192010 47964907163328 templates.py:267] Found an exact template match 3iqu_A.

last line above repeats in a similar fashion

I0128 12:45:52.200458 47964907163328 pipeline.py:234] Uniref90 MSA size: 6547 sequences.
I0128 12:45:52.200723 47964907163328 pipeline.py:235] BFD MSA size: 1681 sequences.
I0128 12:45:52.200868 47964907163328 pipeline.py:236] MGnify MSA size: 501 sequences.
I0128 12:45:52.201004 47964907163328 pipeline.py:237] Final (deduplicated) MSA size: 8552 sequences.
I0128 12:45:52.201264 47964907163328 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0128 12:45:52.217904 47964907163328 run_alphafold.py:190] Running model model_1_pred_0 on 1433
2023-01-28 12:45:54.053234: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2023-01-28 12:45:54.053673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:17:00.0 name: NVIDIA RTX A5000 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 64 deviceMemorySize: 23.68GiB deviceMemoryBandwidth: 715.34GiB/s
2023-01-28 12:45:54.053796: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

last 2 lines above repeats in a similar fashion

2023-01-28 12:45:54.076987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-01-28 12:45:54.117571: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-28 12:45:54.120336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:17:00.0 name: NVIDIA RTX A5000 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 64 deviceMemorySize: 23.68GiB deviceMemoryBandwidth: 715.34GiB/s
2023-01-28 12:45:54.120701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-01-28 12:45:54.120825: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2023-01-28 12:45:55.049979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-01-28 12:45:55.050129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2023-01-28 12:45:55.050237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2023-01-28 12:45:55.050914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20372 MB memory) -> physical GPU (device: 0, name: NVIDIA RTX A5000, pci bus id: 0000:17:00.0, compute capability: 8.6)
2023-01-28 12:45:55.111213: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2900000000 Hz
I0128 12:45:55.890499 47964907163328 model.py:165] Running predict with shape(feat) = {'aatype': (4, 245), 'residue_index': (4, 245), 'seq_length': (4,), 'template_aatype': (4, 4, 245), 'template_all_atom_masks': (4, 4, 245, 37), 'template_all_atom_positions': (4, 4, 245, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 245), 'msa_mask': (4, 508, 245), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 245, 3), 'template_pseudo_beta_mask': (4, 4, 245), 'atom14_atom_exists': (4, 245, 14), 'residx_atom14_to_atom37': (4, 245, 14), 'residx_atom37_to_atom14': (4, 245, 37), 'atom37_atom_exists': (4, 245, 37), 'extra_msa': (4, 5120, 245), 'extra_msa_mask': (4, 5120, 245), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 245), 'true_msa': (4, 508, 245), 'extra_has_deletion': (4, 5120, 245), 'extra_deletion_value': (4, 5120, 245), 'msa_feat': (4, 508, 245, 49), 'target_feat': (4, 245, 22)}
2023-01-28 12:47:20.197797: I external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:328] ptxas warning : Registers are spilled to local memory in function 'fusion_1306', 8 bytes spill stores, 8 bytes spill loads

last line above repeats in a similar fashion

I0128 12:47:55.741957 47964907163328 model.py:175] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (245, 245, 64)}, 'experimentally_resolved': {'logits': (245, 37)}, 'masked_msa': {'logits': (508, 245, 23)}, 'predicted_lddt': {'logits': (245, 50)}, 'structure_module': {'final_atom_mask': (245, 37), 'final_atom_positions': (245, 37, 3)}, 'plddt': (245,), 'ranking_confidence': ()}
I0128 12:47:55.742465 47964907163328 run_alphafold.py:202] Total JAX model model_1_pred_0 on 1433 predict time (includes compilation time, see --benchmark): 119.9s
I0128 12:48:01.739979 47964907163328 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 244 (ASN) of chain 0>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}}
I0128 12:48:02.464479 47964907163328 amber_minimize.py:407] Minimizing protein, attempt 1 of 100.
I0128 12:48:02.778727 47964907163328 amber_minimize.py:68] Restraining 1945 / 3862 particles.
I0128 12:48:03.546111 47964907163328 amber_minimize.py:417] Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

last 3 lines above repeat till the 37th attempt Here seems to be the root cause of the error?

I0128 12:48:55.170659 47964907163328 amber_minimize.py:407] Minimizing protein, attempt 37 of 100.
I0128 12:48:56.013604 47964907163328 amber_minimize.py:68] Restraining 1945 / 3862 particles.
I0128 12:48:57.663788 47964907163328 amber_minimize.py:417] No compatible CUDA device is available

last 3 lines above repeat till the 100th attempt

Traceback (most recent call last):
  File "/work/project/ladlad_008/sec_inst/alphafold-2.2.0/run_alphafold.py", line 422, in <module>
    app.run(main)
  File "/home/mheimhalt/.conda/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/mheimhalt/.conda/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/work/project/ladlad_008/sec_inst/alphafold-2.2.0/run_alphafold.py", line 398, in main
    predict_structure(
  File "/work/project/ladlad_008/sec_inst/alphafold-2.2.0/run_alphafold.py", line 242, in predict_structure
    relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
  File "/work/project/ladlad_008/sec_inst/alphafold-2.2.0/alphafold/relax/relax.py", line 61, in process
    out = amber_minimize.run_pipeline(
  File "/work/project/ladlad_008/sec_inst/alphafold-2.2.0/alphafold/relax/amber_minimize.py", line 475, in run_pipeline
    ret = _run_one_iteration(
  File "/work/project/ladlad_008/sec_inst/alphafold-2.2.0/alphafold/relax/amber_minimize.py", line 419, in _run_one_iteration
    raise ValueError(f"Minimization failed after {max_attempts} attempts.")
ValueError: Minimization failed after 100 attempts.
Augustin-Zidek commented 1 year ago

Could you try running with --enable_gpu_relax=false? This forces the relaxation step (which is failing for you) to run on CPU instead of GPU.

Alternatively, you could also try turning the relaxation step off completely using --models_to_relax=none.

MarkusRainerSchmidt commented 1 year ago

Thanks for the advice!!

The prediction finished successfully with --enable_gpu_relax=false. However this seems like an unsatisfactory workaround...

Do you think the error is a CUDA issue or is it rather the RTX A5000 GPU?

Augustin-Zidek commented 1 year ago

It looks like an OpenMM issue (which we use for the last relaxation step), since the model runs fine and uses your GPU. Could you try raising this issue with OpenMM developers? Another possible workaround might be updating your OpenMM installation to 7.7.0 -- maybe that will help.

MarkusRainerSchmidt commented 1 year ago

We got help from the OpenMM devs: https://github.com/openmm/openmm/issues/3950 Now everything is running on GPU.

Augustin-Zidek commented 1 year ago

Great to hear you solved it! Kudos to OpenMM devs.