Zuricho / ParallelFold

Modified version of Alphafold to divide CPU part (MSA and template searching) and GPU part. This can accelerate Alphafold when predicting multiple structures
https://parafold.sjtu.edu.cn
147 stars 45 forks source link

How to perform parallel search when generating MSA in the CPU section? #48

Open Kangfengyuuuu opened 4 months ago

Kangfengyuuuu commented 4 months ago

Hello everyone, I have a question. I followed the author's method to configure the Parafold environment and successfully ran it. But I found that when running the CPU part, it seems to be done step by step. For example: I0716 06:57:38.355164 140494641588032 templates.py:857] Using precomputed obsolete pdbs /home/kangfengyu/Alphafold/dataset/pdolete.dat. I0716 06:57:40.537640 140494641588032 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to findregistry given worker: I0716 06:57:40.719006 140494641588032 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find regisorm with name: "rocm". Available platform names are: CUDA Interpreter Host I0716 06:57:40.719784 140494641588032 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' haute 'get_tpu_client' I0716 06:57:40.719940 140494641588032 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributt_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults t enable this. I0716 06:57:43.401469 140494641588032 run_alphafold.py:445] Have 1 models: ['model_1_pred_0'] I0716 06:57:43.401771 140494641588032 run_alphafold.py:459] Using random seed 8276676206500880148 for the data pipeline I0716 06:57:43.402130 140494641588032 run_alphafold.py:189] Predicting pi9 I0716 06:57:43.403663 140494641588032 jackhmmer.py:133] Launching subprocess "/home/kangfengyu/miniconda3/envs/alphafold/bin/o /dev/null -A /tmp/tmpqrn69bnd/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ../t /home/kangfengyu/Alphafold/dataset/uniref90/uniref90.fasta" I0716 06:57:43.520144 140494641588032 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0716 07:03:05.880450 140494641588032 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 322.360 seconds I0716 07:03:05.884525 140494641588032 jackhmmer.py:133] Launching subprocess "/home/kangfengyu/miniconda3/envs/alphafold/bin/o /dev/null -A /tmp/tmpwny_yhhy/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ../t /home/kangfengyu/Alphafold/dataset/mgnify/mgy_clusters_2022_05.fa" I0716 07:03:06.005441 140494641588032 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query I0716 07:15:21.214914 140494641588032 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 735.209 seconds I0716 07:15:21.220489 140494641588032 hhsearch.py:85] Launching subprocess "/home/kangfengyu/miniconda3/envs/alphafold/bin/hhtmp/tmp5b1wlqei/query.a3m -o /tmp/tmp5b1wlqei/output.hhr -maxseq 1000000 -d /home/kangfengyu/Alphafold/dataset/pdb70/pdb70" I0716 07:15:21.377889 140494641588032 utils.py:36] Started HHsearch query I0716 07:18:02.685744 140494641588032 utils.py:40] Finished HHsearch query in 161.307 seconds I0716 07:18:02.695558 140494641588032 jackhmmer.py:133] Launching subprocess "/home/kangfengyu/miniconda3/envs/alphafold/bin/o /dev/null -A /tmp/tmpci411vvm/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ../t /home/kangfengyu/Alphafold/dataset/small_bfd/bfd-first_non_consensus_sequences.fasta" I0716 07:18:02.850154 140494641588032 utils.py:36] Started Jackhmmer (bfd-first_non_consensus_sequences.fasta) query I0716 07:19:59.872375 140494641588032 utils.py:40] Finished Jackhmmer (bfd-first_non_consensus_sequences.fasta) query in 117. I0716 07:19:59.874616 140494641588032 templates.py:878] Searching for template for: MQFSQILTVLFLGVSVSALPAGGLPGSPGSAVQRCHCPPRGEAPEAEGDAKISARYTCPNCHKTGKGCDDGWCQVEKTHW I0716 07:20:00.522553 140494641588032 templates.py:267] Found an exact template match 3f2b_A. I0716 07:20:00.865586 140494641588032 templates.py:267] Found an exact template match 3f2d_A. I0716 07:20:01.862293 140494641588032 templates.py:267] Found an exact template match 1s24_A. I0716 07:20:02.080361 140494641588032 templates.py:267] Found an exact template match 5xpd_A. I0716 07:20:02.791265 140494641588032 templates.py:267] Found an exact template match 1xjh_A. I0716 07:20:03.416912 140494641588032 templates.py:267] Found an exact template match 2m6p_A. I0716 07:20:03.761641 140494641588032 templates.py:267] Found an exact template match 4kyw_A. I0716 07:20:04.277867 140494641588032 templates.py:267] Found an exact template match 4esj_B.

Is this considered parallel search? Looking forward to receiving an answer.