aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Apache License 2.0
2.84k stars 550 forks source link

--trace_model error when running inferencing #204

Open shangz-ai opened 2 years ago

shangz-ai commented 2 years ago

Hi @gahdritz,

Here are a bit more details about my error when adding --trace_model option while doing inferencing.

Here is the commands I run:

python3 run_pretrained_openfold.py \
    /mnt/sampledata/fasta_T1044 \
    /database/data/pdb_mmcif/mmcif_files/ \
    --uniref90_database_path /database/data/uniref90/uniref90.fasta \
    --mgnify_database_path /database/data/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path /database/data/pdb70/pdb70 \
    --uniclust30_database_path /database/data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --output_dir /mnt/sampledata/output_inference_T1044 \
    --bfd_database_path /database/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --model_device cuda:0 \
    --jackhmmer_binary_path /opt/conda/bin/jackhmmer \
    --hhblits_binary_path /opt/conda/bin/hhblits \
    --hhsearch_binary_path /opt/conda/bin/hhsearch \
    --kalign_binary_path /opt/conda/bin/kalign \
    --config_preset "model_1_ptm" \
    --jax_param_path /database/data/params/params_model_1_ptm.npz \
    --use_precomputed_alignments /database/sampledata/output_inference_T1044/alignments \
    --skip_relaxation

This finishes the inferencing successfully and returns inferencing time etc.

However, when I add --trace_model to the previous command, I got the error message as:

INFO:run_pretrained_openfold.py:Successfully loaded JAX parameters at /database/data/params/params_model_1_ptm.npz...
INFO:run_pretrained_openfold.py:Using precomputed alignments for T1044 at /database/sampledata/output_inference_T1044/alignments...
Traceback (most recent call last):
  File "run_pretrained_openfold.py", line 617, in <module>
    main(args)
  File "run_pretrained_openfold.py", line 425, in main
    feature_dict, mode='predict',
  File "/mnt/openfold/openfold/data/feature_pipeline.py", line 115, in process_features
    mode=mode,
  File "/mnt/openfold/openfold/data/feature_pipeline.py", line 94, in np_example_to_features
    cfg[mode],
  File "/mnt/openfold/openfold/data/input_pipeline.py", line 179, in process_tensors_from_config
    tensors = compose(nonensembled)(tensors)
  File "/mnt/openfold/openfold/data/data_transforms.py", line 76, in <lambda>
    return lambda x: f(x, *args, **kwargs)
  File "/mnt/openfold/openfold/data/input_pipeline.py", line 196, in compose
    x = f(x)
  File "/mnt/openfold/openfold/data/data_transforms.py", line 99, in fix_templates_aatype
    new_order, 1, index=protein["template_aatype"]
RuntimeError: Index tensor must have the same number of dimensions as input tensor

It seems to me that the error may come from the pad_feature_dict_seq operation on feature_dict (https://github.com/aqlaboratory/openfold/blob/main/run_pretrained_openfold.py#L415). Do you have any insights to see why I'm encountering the error message when enabling --trace_model?

Thanks, Shang

gahdritz commented 2 years ago

I can't reproduce this.

(openfold_venv) [d@rustyamd1 openfold]$ python run_pretrained_openfold.py ~/gustaf_stuff/OpenFold/casp/casp_fastas/subtasks/t1159/ ~/gustaf_stuff/OpenFold/pdb_mmcif/mmcif_files/ --use_precomputed_alignments ~/gustaf_stuff/OpenFold/casp/alignments/ --output_dir ./lma_test_o
utputs --model_device "cuda:0" --config_preset "model_1" --obsolete_pdbs_path ~/gustaf_stuff/OpenFold/pdb_mmcif/obsolete.dat --output_postfix "epoch_94_recycle_10_templates_4_dimer" --skip_relaxation --jax_param_path openfold/resources/params/params_model_1.npz --trace_model
INFO:run_pretrained_openfold.py:Successfully loaded JAX parameters at openfold/resources/params/params_model_1.npz...
INFO:run_pretrained_openfold.py:Using precomputed alignments for T1159 at /mnt/home/dberenberg/gustaf_stuff/OpenFold/casp/alignments/...
INFO:run_pretrained_openfold.py:Tracing model at 200 residues...
INFO:run_pretrained_openfold.py:Tracing time: 73.32858817999659
INFO:run_pretrained_openfold.py:Running inference for T1159...
INFO:run_pretrained_openfold.py:Inference time: 10.851225507998606
INFO:run_pretrained_openfold.py:Output written to ./lma_test_outputs/predictions/T1159_model_1_epoch_94_recycle_10_templates_4_dimer_unrelaxed.pdb...

Is it possible that you've made local edits to the repo? If not, could you send your precomputed alignment dir + FASTA file?

shangz-ai commented 2 years ago

I was using T1044 as attached: https://drive.google.com/drive/folders/14Y0FGM9CKZdzItu3ahhsFzc9_663VK_F?usp=sharing

Thanks!

I just got T1031 work in trace mode, which is in rather short residual length. So it seems to be related to the alignment and FASTA I guess