Zuricho / ParallelFold

Modified version of Alphafold to divide CPU part (MSA and template searching) and GPU part. This can accelerate Alphafold when predicting multiple structures
https://parafold.sjtu.edu.cn
147 stars 45 forks source link

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed #15

Closed chenshixinnb closed 2 years ago

chenshixinnb commented 2 years ago

After the first CPU run,command:./run_alphafold.sh -d $DATA_DIR -o $OUTPUT_DIR -p multimer -i $INPUT_DIR/test.fasta -t 2021-11-01 -m model_1 -f

chenshixinnb commented 2 years ago

The second step is to use the GPU,Error occurred,command:./run_alphafold.sh -d $DATA_DIR -o $OUTPUT_DIR -p multimer -m model_1,model_2,model_3,model_4,model_5 -i $INPUT_DIR/test.fasta -t 2021-11-01;

chenshixinnb commented 2 years ago

2022-01-14 15:50:56.884239: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 I0114 15:51:34.441559 47327477676160 templates.py:857] Using precomputed obsolete pdbs /public/software/.local/easybuild/software/alphafold/data2/pdb_mmcif/obsolete.dat. I0114 15:51:35.627506 47327477676160 xla_bridge.py:243] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0114 15:51:35.627812 47327477676160 xla_bridge.py:243] Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: "cuda". Available platform names are: Interpreter Host I0114 15:51:35.628199 47327477676160 xla_bridge.py:243] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available. W0114 15:51:35.628336 47327477676160 xla_bridge.py:248] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) I0114 15:51:36.549557 47327477676160 run_alphafold.py:407] Have 1 models: ['model_1'] I0114 15:51:36.549774 47327477676160 run_alphafold.py:423] Using random seed 6020071004121300369 for the data pipeline I0114 15:51:36.549989 47327477676160 run_alphafold.py:156] Predicting fuheti I0114 15:51:36.700862 47327477676160 run_alphafold.py:202] Running model model_1 on fuheti Traceback (most recent call last): File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 455, in app.run(main) File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 429, in main predict_structure( File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 205, in predict_structure processed_feature_dict = model_runner.process_features( File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/alphafold/model/model.py", line 131, in process_features return features.np_example_to_features( File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/alphafold/model/features.py", line 83, in np_example_to_features num_res = int(np_example['seq_length'][0]) IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

Zuricho commented 2 years ago

We are working on this problem. I will send another reply when we have any updataes

Zuricho commented 2 years ago

Could you send me you input fasta file and let me check it? My email is zbztzhz@gmail.com

chenshixinnb commented 2 years ago

It has been sent, thank you

Zuricho commented 2 years ago

OK I understand, you may try use -m model_1_multimer, not -m model_1 in GPU part

chenshixinnb commented 2 years ago

I use command:$PROGRAM_DIR/run_alphafold.sh -d $DATA_DIR -o $OUTPUT_DIR -p multimer -m model_1_multimer -i $INPUT_DIR/test2.fasta -t 2021-11-01;now the run is stuck here all the time:

chenshixinnb commented 2 years ago

2022-01-15 13:35:27.686400: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 I0115 13:36:03.102030 47367004878976 templates.py:857] Using precomputed obsolete pdbs /public/software/.local/easybuild/software/alphafold/data2/pdb_mmcif/obsolete.dat. I0115 13:36:04.031663 47367004878976 xla_bridge.py:243] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0115 13:36:04.031914 47367004878976 xla_bridge.py:243] Unable to initialize backend 'gpu': NOT_FOUND: Could not find registered platform with name: "cuda". Available platform names are: Interpreter Host I0115 13:36:04.032272 47367004878976 xla_bridge.py:243] Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available. W0115 13:36:04.032407 47367004878976 xla_bridge.py:248] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) I0115 13:36:04.889354 47367004878976 run_alphafold.py:407] Have 1 models: ['model_1_multimer'] I0115 13:36:04.889528 47367004878976 run_alphafold.py:423] Using random seed 9133243819396004162 for the data pipeline I0115 13:36:04.889718 47367004878976 run_alphafold.py:156] Predicting test2 I0115 13:36:05.007085 47367004878976 run_alphafold.py:202] Running model model_1_multimer on test2 I0115 13:36:05.007616 47367004878976 model.py:165] Running predict with shape(feat) = {'aatype': (646,), 'residue_index': (646,), 'seq_length': (), 'msa': (3101, 646), 'num_alignments': (), 'template_aatype': (4, 646), 'template_all_atom_mask': (4, 646, 37), 'template_all_atom_positions': (4, 646, 37, 3), 'asym_id': (646,), 'sym_id': (646,), 'entity_id': (646,), 'deletion_matrix': (3101, 646), 'deletion_mean': (646,), 'all_atom_mask': (646, 37), 'all_atom_positions': (646, 37, 3), 'assembly_num_chains': (), 'entity_mask': (646,), 'num_templates': (), 'cluster_bias_mask': (3101,), 'bert_mask': (3101, 646), 'seq_mask': (646,), 'msa_mask': (3101, 646)} 2022-01-15 13:39:56.909318: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:55]


Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results. Compiling module jit_apply_fn.96373


Zuricho commented 2 years ago

It's strange that your JAX is using CPU rather than GPU, did you well prepared the environment? Like did you install your CUDA toolkit and load your local CUDA environment

chenshixinnb commented 2 years ago

Thanks,How do I make sure JAX is A GPU version?I confirm that the run loaded the environment,previous predictions of monomer structures were also successful.

Zuricho commented 2 years ago

Yeah, it's really strange that your monomer is working and multimer models are not. Usually if your GPU cannot be detect, both monomer and multimer are not using GPU

Zuricho commented 2 years ago

To see if your program detect your GPU, you can use this:

python
>>> import tensorflow as tf; print(tf.config.list_physical_devices("GPU"))
>>> import jax; print(jax.devices())
chenshixinnb commented 2 years ago

To see if your program detect your GPU, you can use this:

python
>>> import tensorflow as tf; print(tf.config.list_physical_devices("GPU"))
>>> import jax; print(jax.devices())
>>> import tensorflow as tf; print(tf.config.list_physical_devices("GPU"))
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
>>> import jax; print(jax.devices())
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
[<jaxlib.xla_extension.Device object at 0x2ba093904930>]
Zuricho commented 2 years ago

So you indeen need to check your JAX version to ensure that your model can use GPU, maybe reinstall the environment?

chenshixinnb commented 2 years ago

OK,thanks

WishIWasBornInTheCreteaceousEra commented 2 months ago

Hi Zuricho,

Unfortunately the issue still persists as the proper model name is now model_1_multimer_v3 for the GPU part. I figured I'd leave this comment for posterity sake.