Closed chenshixinnb closed 2 years ago
The second step is to use the GPU,Error occurred,command:./run_alphafold.sh -d $DATA_DIR -o $OUTPUT_DIR -p multimer -m model_1,model_2,model_3,model_4,model_5 -i $INPUT_DIR/test.fasta -t 2021-11-01
;
2022-01-14 15:50:56.884239: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I0114 15:51:34.441559 47327477676160 templates.py:857] Using precomputed obsolete pdbs /public/software/.local/easybuild/software/alphafold/data2/pdb_mmcif/obsolete.dat.
I0114 15:51:35.627506 47327477676160 xla_bridge.py:243] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I0114 15:51:35.627812 47327477676160 xla_bridge.py:243] Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: "cuda". Available platform names are: Interpreter Host
I0114 15:51:35.628199 47327477676160 xla_bridge.py:243] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available.
W0114 15:51:35.628336 47327477676160 xla_bridge.py:248] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I0114 15:51:36.549557 47327477676160 run_alphafold.py:407] Have 1 models: ['model_1']
I0114 15:51:36.549774 47327477676160 run_alphafold.py:423] Using random seed 6020071004121300369 for the data pipeline
I0114 15:51:36.549989 47327477676160 run_alphafold.py:156] Predicting fuheti
I0114 15:51:36.700862 47327477676160 run_alphafold.py:202] Running model model_1 on fuheti
Traceback (most recent call last):
File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 455, in
We are working on this problem. I will send another reply when we have any updataes
Could you send me you input fasta file and let me check it? My email is zbztzhz@gmail.com
It has been sent, thank you
OK I understand, you may try use -m model_1_multimer
, not -m model_1
in GPU part
I use command:$PROGRAM_DIR/run_alphafold.sh -d $DATA_DIR -o $OUTPUT_DIR -p multimer -m model_1_multimer -i $INPUT_DIR/test2.fasta -t 2021-11-01
;now the run is stuck here all the time:
2022-01-15 13:35:27.686400: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 I0115 13:36:03.102030 47367004878976 templates.py:857] Using precomputed obsolete pdbs /public/software/.local/easybuild/software/alphafold/data2/pdb_mmcif/obsolete.dat. I0115 13:36:04.031663 47367004878976 xla_bridge.py:243] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0115 13:36:04.031914 47367004878976 xla_bridge.py:243] Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: "cuda". Available platform names are: Interpreter Host I0115 13:36:04.032272 47367004878976 xla_bridge.py:243] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available. W0115 13:36:04.032407 47367004878976 xla_bridge.py:248] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) I0115 13:36:04.889354 47367004878976 run_alphafold.py:407] Have 1 models: ['model_1_multimer'] I0115 13:36:04.889528 47367004878976 run_alphafold.py:423] Using random seed 9133243819396004162 for the data pipeline I0115 13:36:04.889718 47367004878976 run_alphafold.py:156] Predicting test2 I0115 13:36:05.007085 47367004878976 run_alphafold.py:202] Running model model_1_multimer on test2 I0115 13:36:05.007616 47367004878976 model.py:165] Running predict with shape(feat) = {'aatype': (646,), 'residue_index': (646,), 'seq_length': (), 'msa': (3101, 646), 'num_alignments': (), 'template_aatype': (4, 646), 'template_all_atom_mask': (4, 646, 37), 'template_all_atom_positions': (4, 646, 37, 3), 'asym_id': (646,), 'sym_id': (646,), 'entity_id': (646,), 'deletion_matrix': (3101, 646), 'deletion_mean': (646,), 'all_atom_mask': (646, 37), 'all_atom_positions': (646, 37, 3), 'assembly_num_chains': (), 'entity_mask': (646,), 'num_templates': (), 'cluster_bias_mask': (3101,), 'bert_mask': (3101, 646), 'seq_mask': (646,), 'msa_mask': (3101, 646)} 2022-01-15 13:39:56.909318: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55]
Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results. Compiling module jit_apply_fn.96373
It's strange that your JAX is using CPU rather than GPU, did you well prepared the environment? Like did you install your CUDA toolkit and load your local CUDA environment
Thanks,How do I make sure JAX is A GPU version?I confirm that the run loaded the environment,previous predictions of monomer structures were also successful.
Yeah, it's really strange that your monomer is working and multimer models are not. Usually if your GPU cannot be detect, both monomer and multimer are not using GPU
To see if your program detect your GPU, you can use this:
python
>>> import tensorflow as tf; print(tf.config.list_physical_devices("GPU"))
>>> import jax; print(jax.devices())
To see if your program detect your GPU, you can use this:
python >>> import tensorflow as tf; print(tf.config.list_physical_devices("GPU")) >>> import jax; print(jax.devices())
>>> import tensorflow as tf; print(tf.config.list_physical_devices("GPU"))
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>> import jax; print(jax.devices())
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
[<jaxlib.xla_extension.Device object at 0x2ba093904930>]
So you indeen need to check your JAX version to ensure that your model can use GPU, maybe reinstall the environment?
OK,thanks
Hi Zuricho,
Unfortunately the issue still persists as the proper model name is now model_1_multimer_v3
for the GPU part. I figured I'd leave this comment for posterity sake.
After the first CPU run,command:
./run_alphafold.sh -d $DATA_DIR -o $OUTPUT_DIR -p multimer -i $INPUT_DIR/test.fasta -t 2021-11-01 -m model_1 -f