google-deepmind / alphafold

Open source code for AlphaFold.
Apache License 2.0
12.24k stars 2.19k forks source link

A problem occurred while I was running the Alphafold #646

Closed jhyeonv closed 1 year ago

jhyeonv commented 1 year ago

Hi. A problem occurred while I was running the Alphafold. Could I ask for help on how to solve it? Please check the command below and if you need any information about that plase let me know.

jhyeon@jhyeon-Ubuntu:~/Desktop/data/SW/alphafold$ sudo python3 docker/run_docker.py --fasta_paths=T.fa --max_template_date=2021-12-31 --data_dir=/mnt/8THDD/data/db/AFDB/ --model_preset=monomer --output_dir=/home/jhyeon/Desktop/data/alphatest
[sudo] password for jhyeon:
I1129 20:06:44.587526 140678175391744 run_docker.py:113] Mounting /home/jhyeon/Desktop/data/SW/alphafold -> /mnt/fasta_path_0
I1129 20:06:44.998342 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/uniref90 -> /mnt/uniref90_database_path
I1129 20:06:45.015172 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/mgnify -> /mnt/mgnify_database_path
I1129 20:06:45.015434 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB -> /mnt/data_dir
I1129 20:06:45.039951 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir
I1129 20:06:45.040283 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/pdb_mmcif -> /mnt/obsolete_pdbs_path
I1129 20:06:45.058208 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/pdb70 -> /mnt/pdb70_database_path
I1129 20:06:45.099748 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/uniclust30/uniclust30_2018_08 -> /mnt/uniclust30_database_path
I1129 20:06:45.100375 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/bfd -> /mnt/bfd_database_path
I1129 20:06:47.216925 140678175391744 run_docker.py:255] /opt/conda/lib/python3.7/site-packages/haiku/_src/data_structures.py:37: FutureWarning: jax.tree_structure is deprecated, and will be removed in a future release. Use jax.tree_util.tree_structure instead.
I1129 20:06:47.217015 140678175391744 run_docker.py:255] PyTreeDef = type(jax.tree_structure(None))
I1129 20:06:47.668879 140678175391744 run_docker.py:255] I1129 11:06:47.668288 140554574919488 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat.
I1129 20:06:50.314509 140678175391744 run_docker.py:255] I1129 11:06:50.314045 140554574919488 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I1129 20:06:50.395502 140678175391744 run_docker.py:255] I1129 11:06:50.394992 140554574919488 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host CUDA Interpreter
I1129 20:06:50.395585 140678175391744 run_docker.py:255] I1129 11:06:50.395253 140554574919488 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1129 20:06:50.395616 140678175391744 run_docker.py:255] I1129 11:06:50.395327 140554574919488 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I1129 20:07:01.872005 140678175391744 run_docker.py:255] I1129 11:07:01.871499 140554574919488 run_alphafold.py:377] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I1129 20:07:01.872115 140678175391744 run_docker.py:255] I1129 11:07:01.871589 140554574919488 run_alphafold.py:393] Using random seed 776916752095554131 for the data pipeline
I1129 20:07:01.872142 140678175391744 run_docker.py:255] I1129 11:07:01.871697 140554574919488 run_alphafold.py:161] Predicting T
I1129 20:07:01.872171 140678175391744 run_docker.py:255] I1129 11:07:01.871899 140554574919488 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpyb0_qtih/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/T.fa /mnt/uniref90_database_path/uniref90.fasta"
I1129 20:07:01.908833 140678175391744 run_docker.py:255] I1129 11:07:01.908337 140554574919488 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I1129 20:17:08.123620 140678175391744 run_docker.py:255] I1129 11:17:08.122851 140554574919488 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 606.214 seconds
I1129 20:17:08.124002 140678175391744 run_docker.py:255] I1129 11:17:08.123401 140554574919488 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp3h2swg2s/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/T.fa /mnt/mgnify_database_path/mgy_clusters_2018_12.fa"
I1129 20:17:08.153515 140678175391744 run_docker.py:255] I1129 11:17:08.152948 140554574919488 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I1129 20:25:33.106717 140678175391744 run_docker.py:255] I1129 11:25:33.105955 140554574919488 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 504.953 seconds
I1129 20:25:33.107178 140678175391744 run_docker.py:255] I1129 11:25:33.106705 140554574919488 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpi641el8l/query.a3m -o /tmp/tmpi641el8l/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70"
I1129 20:25:33.138576 140678175391744 run_docker.py:255] I1129 11:25:33.138024 140554574919488 utils.py:36] Started HHsearch query
I1129 20:26:54.573565 140678175391744 run_docker.py:255] I1129 11:26:54.573027 140554574919488 utils.py:40] Finished HHsearch query in 81.435 seconds
I1129 20:26:54.582471 140678175391744 run_docker.py:255] I1129 11:26:54.581876 140554574919488 hhblits.py:128] Launching subprocess "/usr/bin/hhblits -i /mnt/fasta_path_0/T.fa -cpu 4 -oa3m /tmp/tmp0adsap7y/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /mnt/bfd_database_path/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /mnt/uniclust30_database_path/uniclust30_2018_08"
I1129 20:26:54.614019 140678175391744 run_docker.py:255] I1129 11:26:54.613526 140554574919488 utils.py:36] Started HHblits query
I1129 20:39:47.502100 140678175391744 run_docker.py:255] I1129 11:39:47.501456 140554574919488 utils.py:40] Finished HHblits query in 772.888 seconds
I1129 20:39:47.511255 140678175391744 run_docker.py:255] I1129 11:39:47.510757 140554574919488 templates.py:878] Searching for template for: MAAAAAAAAAAAAAAAAAAAAAAAAA
I1129 20:39:49.477572 140678175391744 run_docker.py:255] I1129 11:39:49.476987 140554574919488 templates.py:268] Found an exact template match 4d10_F.
I1129 20:39:49.481890 140678175391744 run_docker.py:255] I1129 11:39:49.481489 140554574919488 templates.py:913] Skipped invalid hit 4D10_F COP9 SIGNALOSOME COMPLEX SUBUNIT 1; SIGNALING PROTEIN; 3.8A {HOMO SAPIENS}, error: None, warning: 4d10_F (sum_probs: 0.0, rank: 1): feature extracting errors: Template all atom mask was all zeros: 4d10_F. Residue range: 4-13, mmCIF parsing errors: {}
I1129 20:39:54.642663 140678175391744 run_docker.py:255] I1129 11:39:54.642101 140554574919488 templates.py:268] Found an exact template match 4v7h_BQ.
I1129 20:39:56.860277 140678175391744 run_docker.py:255] I1129 11:39:56.859714 140554574919488 templates.py:268] Found an exact template match 6j3y_W.
I1129 20:39:56.861339 140678175391744 run_docker.py:255] I1129 11:39:56.860958 140554574919488 templates.py:913] Skipped invalid hit 6J3Y_W Photosystem II reaction center protein; Photosystem, ELECTRON TRANSPORT; HET: LMG, HEM, DGD, SQD, LMU, BCR, CLA, OEX, PHO, PL9, LHG, A86; 3.3A {Chaetoceros gracilis}, error: None, warning: 6j3y_W (sum_probs: 0.0, rank: 3): feature extracting errors: Template all atom mask was all zeros: 6j3y_W. Residue range: 0-18, mmCIF parsing errors: {}
I1129 20:39:59.248605 140678175391744 run_docker.py:255] I1129 11:39:59.247956 140554574919488 templates.py:268] Found an exact template match 6j3z_w.
I1129 20:39:59.249619 140678175391744 run_docker.py:255] I1129 11:39:59.249154 140554574919488 templates.py:913] Skipped invalid hit 6J3Z_w Photosystem II reaction center protein; Photosystem, ELECTRON TRANSPORT; HET: LMG, HEM, DGD, SQD, LMU, BCR, CLA, OEX, PHO, PL9, LHG, A86; 3.6A {Chaetoceros gracilis}, error: None, warning: 6j3z_w (sum_probs: 0.0, rank: 4): feature extracting errors: Template all atom mask was all zeros: 6j3z_w. Residue range: 0-18, mmCIF parsing errors: {}
I1129 20:39:59.369462 140678175391744 run_docker.py:255] I1129 11:39:59.369046 140554574919488 templates.py:268] Found an exact template match 1m0u_B.
I1129 20:39:59.372590 140678175391744 run_docker.py:255] I1129 11:39:59.372143 140554574919488 templates.py:913] Skipped invalid hit 1M0U_B GST2 gene product (E.C.2.5.1.18); GST, Flight Muscle Protein, Sigma; HET: SO4, GSH; 1.75A {Drosophila melanogaster} SCOP: a.45.1.1, c.47.1.5, error: None, warning: 1m0u_B (sum_probs: 0.0, rank: 5): feature extracting errors: Template all atom mask was all zeros: 1m0u_B. Residue range: 0-23, mmCIF parsing errors: {}
I1129 20:39:59.973728 140678175391744 run_docker.py:255] I1129 11:39:59.973092 140554574919488 templates.py:268] Found an exact template match 1kn7_A.
I1129 20:40:02.077867 140678175391744 run_docker.py:255] I1129 11:40:02.077314 140554574919488 templates.py:268] Found an exact template match 6rfq_8.
I1129 20:40:04.344768 140678175391744 run_docker.py:255] I1129 11:40:04.336282 140554574919488 templates.py:268] Found an exact template match 6rfr_8.
I1129 20:40:08.787640 140678175391744 run_docker.py:255] I1129 11:40:08.786539 140554574919488 templates.py:268] Found an exact template match 6t59_s3.
I1129 20:40:08.790601 140678175391744 run_docker.py:255] I1129 11:40:08.790089 140554574919488 templates.py:913] Skipped invalid hit 6T59_s3 Ribosomal protein L8, uL3, uL4; TUBULIN, nascent chain-associated complex, ribosome-nascent; HET: MG; 3.11A {Oryctolagus cuniculus}, error: None, warning: 6t59_s3 (sum_probs: 0.0, rank: 9): feature extracting errors: Template all atom mask was all zeros: 6t59_s3. Residue range: 273-297, mmCIF parsing errors: {}
I1129 20:40:09.027597 140678175391744 run_docker.py:255] I1129 11:40:09.027103 140554574919488 templates.py:268] Found an exact template match 3lpj_B.
I1129 20:40:09.033292 140678175391744 run_docker.py:255] I1129 11:40:09.032789 140554574919488 templates.py:913] Skipped invalid hit 3LPJ_B Structure of BACE Bound to; Alzheimer's, Aspartyl protease, Hydrolase; HET: TLA, Z75; 1.79A {Homo sapiens}, error: None, warning: 3lpj_B (sum_probs: 0.0, rank: 10): feature extracting errors: Template all atom mask was all zeros: 3lpj_B. Residue range: 0-24, mmCIF parsing errors: {}
I1129 20:40:09.033350 140678175391744 run_docker.py:255] I1129 11:40:09.033004 140554574919488 pipeline.py:234] Uniref90 MSA size: 1 sequences.
I1129 20:40:09.033376 140678175391744 run_docker.py:255] I1129 11:40:09.033061 140554574919488 pipeline.py:235] BFD MSA size: 1 sequences.
I1129 20:40:09.033398 140678175391744 run_docker.py:255] I1129 11:40:09.033080 140554574919488 pipeline.py:236] MGnify MSA size: 1 sequences.
I1129 20:40:09.033423 140678175391744 run_docker.py:255] I1129 11:40:09.033095 140554574919488 pipeline.py:238] Final (deduplicated) MSA size: 1 sequences.
I1129 20:40:09.033465 140678175391744 run_docker.py:255] I1129 11:40:09.033219 140554574919488 pipeline.py:241] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 4.
I1129 20:40:09.041668 140678175391744 run_docker.py:255] I1129 11:40:09.041206 140554574919488 run_alphafold.py:190] Running model model_1_pred_0 on T
I1129 20:40:10.478550 140678175391744 run_docker.py:255] I1129 11:40:10.478168 140554574919488 model.py:166] Running predict with shape(feat) = {'aatype': (4, 26), 'residue_index': (4, 26), 'seq_length': (4,), 'template_aatype': (4, 4, 26), 'template_all_atom_masks': (4, 4, 26, 37), 'template_all_atom_positions': (4, 4, 26, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 26), 'msa_mask': (4, 508, 26), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 26, 3), 'template_pseudo_beta_mask': (4, 4, 26), 'atom14_atom_exists': (4, 26, 14), 'residx_atom14_to_atom37': (4, 26, 14), 'residx_atom37_to_atom14': (4, 26, 37), 'atom37_atom_exists': (4, 26, 37), 'extra_msa': (4, 5120, 26), 'extra_msa_mask': (4, 5120, 26), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 26), 'true_msa': (4, 508, 26), 'extra_has_deletion': (4, 5120, 26), 'extra_deletion_value': (4, 5120, 26), 'msa_feat': (4, 508, 26, 49), 'target_feat': (4, 26, 22)}
I1129 20:40:10.573517 140678175391744 run_docker.py:255] 2022-11-29 11:40:10.572368: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9
I1129 20:40:10.573782 140678175391744 run_docker.py:255] 2022-11-29 11:40:10.572425: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
I1129 20:40:10.586611 140678175391744 run_docker.py:255] 2022-11-29 11:40:10.585689: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I1129 20:40:10.586843 140678175391744 run_docker.py:255] 2022-11-29 11:40:10.585779: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function
I1129 20:40:10.594201 140678175391744 run_docker.py:255] Traceback (most recent call last):
I1129 20:40:10.594406 140678175391744 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 422, in <module>
I1129 20:40:10.594515 140678175391744 run_docker.py:255] app.run(main)
I1129 20:40:10.594610 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I1129 20:40:10.594703 140678175391744 run_docker.py:255] _run_main(main, args)
I1129 20:40:10.594786 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I1129 20:40:10.594868 140678175391744 run_docker.py:255] sys.exit(main(argv))
I1129 20:40:10.594947 140678175391744 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 406, in main
I1129 20:40:10.595050 140678175391744 run_docker.py:255] random_seed=random_seed)
I1129 20:40:10.595142 140678175391744 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure
I1129 20:40:10.595224 140678175391744 run_docker.py:255] random_seed=model_random_seed)
I1129 20:40:10.595304 140678175391744 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I1129 20:40:10.595379 140678175391744 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I1129 20:40:10.595466 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/random.py", line 132, in PRNGKey
I1129 20:40:10.595538 140678175391744 run_docker.py:255] key = prng.seed_with_impl(impl, seed)
I1129 20:40:10.595611 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
I1129 20:40:10.595727 140678175391744 run_docker.py:255] return random_seed(seed, impl=impl)
I1129 20:40:10.595801 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 580, in random_seed
I1129 20:40:10.595872 140678175391744 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl)
I1129 20:40:10.595945 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 329, in bind
I1129 20:40:10.596016 140678175391744 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params)
I1129 20:40:10.596088 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 332, in bind_with_trace
I1129 20:40:10.596163 140678175391744 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I1129 20:40:10.596238 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 712, in process_primitive
I1129 20:40:10.596311 140678175391744 run_docker.py:255] return primitive.impl(*tracers, **params)
I1129 20:40:10.596384 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 592, in random_seed_impl
I1129 20:40:10.596455 140678175391744 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl)
I1129 20:40:10.596517 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base
I1129 20:40:10.596581 140678175391744 run_docker.py:255] return seed(seeds)
I1129 20:40:10.596646 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 832, in threefry_seed
I1129 20:40:10.596710 140678175391744 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
I1129 20:40:10.596774 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical
I1129 20:40:10.596839 140678175391744 run_docker.py:255] return shift_right_logical_p.bind(x, y)
I1129 20:40:10.596904 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 329, in bind
I1129 20:40:10.596969 140678175391744 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params)
I1129 20:40:10.597032 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 332, in bind_with_trace
I1129 20:40:10.597097 140678175391744 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I1129 20:40:10.597160 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 712, in process_primitive
I1129 20:40:10.597218 140678175391744 run_docker.py:255] return primitive.impl(*tracers, **params)
I1129 20:40:10.597279 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive
I1129 20:40:10.597339 140678175391744 run_docker.py:255] return compiled_fun(*args)
I1129 20:40:10.597402 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 200, in <lambda>
I1129 20:40:10.597465 140678175391744 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0]
I1129 20:40:10.597525 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
I1129 20:40:10.597588 140678175391744 run_docker.py:255] out_flat = compiled.execute(in_flat)
I1129 20:40:10.597648 140678175391744 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function
peterdfields commented 1 year ago

I got this same error since CUDA moved to v.12. I think there will need to be an update on the side of the developer team here or a downgrade in system drivers.

joshabramson commented 1 year ago

does this error persist when using AlphaFold v2.3.0?

peterdfields commented 1 year ago

@joshabramson I think so. I did a fresh build with the newest dockerfile. Here's the output where the error picks up:

I1223 20:04:51.789108 139688861775680 run_docker.py:255] I1224 01:04:51.788743 139711970813760 run_alphafold.py:191] Running model model_1_pred_0 on ECU03_1140
I1223 20:04:54.797782 139688861775680 run_docker.py:255] I1224 01:04:54.797097 139711970813760 model.py:165] Running predict with shape(feat) = {'aatype': (4, 117), 'residue_index': (4, 117), 'seq_length': (4,), 'template_aatype': (4, 4, 117), 'template_all_atom_masks': (4, 4, 117, 37), 'template_all_atom_positions': (4, 4, 117, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 117), 'msa_mask': (4, 508, 117), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 117, 3), 'template_pseudo_beta_mask': (4, 4, 117), 'atom14_atom_exists': (4, 117, 14), 'residx_atom14_to_atom37': (4, 117, 14), 'residx_atom37_to_atom14': (4, 117, 37), 'atom37_atom_exists': (4, 117, 37), 'extra_msa': (4, 5120, 117), 'extra_msa_mask': (4, 5120, 117), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 117), 'true_msa': (4, 508, 117), 'extra_has_deletion': (4, 5120, 117), 'extra_deletion_value': (4, 5120, 117), 'msa_feat': (4, 508, 117, 49), 'target_feat': (4, 117, 22)}
I1223 20:04:56.327672 139688861775680 run_docker.py:255] 2022-12-24 01:04:56.327116: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9
I1223 20:04:56.327813 139688861775680 run_docker.py:255] 2022-12-24 01:04:56.327156: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
I1223 20:04:56.355589 139688861775680 run_docker.py:255] 2022-12-24 01:04:56.355236: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I1223 20:04:56.355756 139688861775680 run_docker.py:255] 2022-12-24 01:04:56.355273: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function
I1223 20:04:56.480186 139688861775680 run_docker.py:255] Traceback (most recent call last):
I1223 20:04:56.480370 139688861775680 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 432, in <module>
I1223 20:04:56.480440 139688861775680 run_docker.py:255] app.run(main)
I1223 20:04:56.480502 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
I1223 20:04:56.480558 139688861775680 run_docker.py:255] _run_main(main, args)
I1223 20:04:56.480613 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
I1223 20:04:56.480718 139688861775680 run_docker.py:255] sys.exit(main(argv))
I1223 20:04:56.480788 139688861775680 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 408, in main
I1223 20:04:56.480844 139688861775680 run_docker.py:255] predict_structure(
I1223 20:04:56.480898 139688861775680 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure
I1223 20:04:56.480951 139688861775680 run_docker.py:255] prediction_result = model_runner.predict(processed_feature_dict,
I1223 20:04:56.481002 139688861775680 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I1223 20:04:56.481052 139688861775680 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I1223 20:04:56.481102 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey
I1223 20:04:56.481152 139688861775680 run_docker.py:255] key = prng.seed_with_impl(impl, seed)
I1223 20:04:56.481202 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
I1223 20:04:56.481253 139688861775680 run_docker.py:255] return random_seed(seed, impl=impl)
I1223 20:04:56.481304 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed
I1223 20:04:56.481354 139688861775680 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl)
I1223 20:04:56.481404 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind
I1223 20:04:56.481456 139688861775680 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params)
I1223 20:04:56.481508 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace
I1223 20:04:56.481559 139688861775680 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I1223 20:04:56.481609 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive
I1223 20:04:56.481659 139688861775680 run_docker.py:255] return primitive.impl(*tracers, **params)
I1223 20:04:56.481709 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl
I1223 20:04:56.481759 139688861775680 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl)
I1223 20:04:56.481808 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base
I1223 20:04:56.481858 139688861775680 run_docker.py:255] return seed(seeds)
I1223 20:04:56.481911 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed
I1223 20:04:56.481950 139688861775680 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
I1223 20:04:56.481987 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical
I1223 20:04:56.482024 139688861775680 run_docker.py:255] return shift_right_logical_p.bind(x, y)
I1223 20:04:56.482062 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind
I1223 20:04:56.482100 139688861775680 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params)
I1223 20:04:56.482138 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace
I1223 20:04:56.482177 139688861775680 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I1223 20:04:56.482217 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive
I1223 20:04:56.482254 139688861775680 run_docker.py:255] return primitive.impl(*tracers, **params)
I1223 20:04:56.482291 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive
I1223 20:04:56.482333 139688861775680 run_docker.py:255] return compiled_fun(*args)
I1223 20:04:56.482372 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 200, in <lambda>
I1223 20:04:56.482409 139688861775680 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0]
I1223 20:04:56.482447 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
I1223 20:04:56.482484 139688861775680 run_docker.py:255] out_flat = compiled.execute(in_flat)
I1223 20:04:56.482520 139688861775680 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function
peterdfields commented 1 year ago

@joshabramson I was able to get alphafold to run by substituting nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 into the dockerfile.

joshabramson commented 1 year ago

closing this for now as it sounds like there is a workaround and it doesn't seem to be affecting all users.

HanLiii commented 1 year ago

For 4090 machine, you need to change the followings in dockfile:

ARG CUDA=11.1.1------->ARG CUDA=11.8.0 FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu18.04------->FROM nvidia/cuda:${CUDA}-cudnn8-devel-ubuntu20.04

Then rebuild.