aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Apache License 2.0
2.82k stars 550 forks source link

a big gap between the pdb results from openfold and alphafold2 (colab) #383

Open Melo-1017 opened 11 months ago

Melo-1017 commented 11 months ago

Hi! After running openfold according to the default configuration, I found that there was a big difference between the output pdb file of openfold and alphafold (colab version). After comparing the real structural data in the laboratory, I found that the results of alphafold were accurate. As shown below, I basically used the default configuration. What reasons might cause this to happen? I would be very grateful for your answers.

My running instructions:

python3 run_pretrained_openfold.py \
    run_fasta \
    data/pdb_mmcif/mmcif_files/ \
    --uniref90_database_path data/uniref90/uniref90.fasta \
    --mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path data/pdb70/pdb70 \
    --uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --output_dir ./ \
    --bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --model_device "cuda:1" \
    --jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
    --hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
    --hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
    --kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \
    --config_preset "model_1_ptm" \
--openfold_checkpoint_path openfold/resources/openfold_params/finetuning_ptm_2.pt

My output:

INFO:/data/xx/openfold/openfold/utils/script_utils.py:Loaded OpenFold parameters at openfold/resources/openfold_params/finetuning_ptm_2.pt...
INFO:/data/xx/openfold/run_pretrained_openfold.py:Using precomputed alignments for at1 at ./alignments...
INFO:/data/xx/openfold/openfold/utils/script_utils.py:Running inference for at1...
INFO:/data/xx/openfold/openfold/utils/script_utils.py:Inference time: 20.264450896997005
INFO:/data/xx/openfold/run_pretrained_openfold.py:Output written to ./predictions/at1_model_1_ptm_unrelaxed.pdb...
INFO:/data/xx/openfold/run_pretrained_openfold.py:Running relaxation on ./predictions/at1_model_1_ptm_unrelaxed.pdb...
WARNING:root:Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
INFO:/data/xx/openfold/openfold/utils/script_utils.py:Relaxation time: 14.143211493967101
INFO:/data/xx/openfold/openfold/utils/script_utils.py:Relaxed output written to ./predictions/at1_model_1_ptm_relaxed.pdb...
vetmax7 commented 7 months ago

@Melo-1017 Hello! Have you found a solution of your problem?

How did you generate alignments for "Using precomputed alignments for at1 at ./alignments..."? By run_pretrained_openfold.pyor not?

wtq18 commented 7 months ago

I encountered the same issue when running multimer inference for the given example, 2q2k, using openfold-multimer. It didn’t perform as well as the alphafold-multimer (colab version). I used the alignments provided here: https://github.com/aqlaboratory/openfold/tree/main/tests/test_data/alignments.

My running scripts:

python run_pretrained_openfold.py tests/test_data/2q2k/ data/pdb_mmcif/mmcif_files/ --uniref90_database_path data/uniref90/uniref90.fasta --mgnify_database_path data/mgniy/mgy_clusters_2022_05.fa --pdb_seqres_database_path data/pdb_seqres/pdb_seqres.txt --uniref30_database_path data/uniref30/UniRef30_2021_03 --uniprot_database_path data/uniprot/uniprot.fasta --bfd_database_path databfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --jackhmmer_binary_path $CONDA_PREFIX/bin/jackhmmer --hhblits_binary_path $CONDA_PREFIX/bin/hhblits --hmmsearch_binary_path $CONDA_PREFIX/bin/hmmsearch --hmbuild_binary_path $CONDA_PREFIX/bin/hmmbuild --kalign_binary_path $CONDA_PREFIX/bin/kalign --config_preset "model_1_multimer_v3" --model_device "cuda:0" --use_precomputed_alignments tests/test_data/2q2k/out/alignments/ --output_dir tests/test_data/2q2k/out

dingquanyu commented 7 months ago

I encountered the same issue when running multimer inference for the given example, 2q2k, using openfold-multimer. It didn’t perform as well as the alphafold-multimer (colab version). I used the alignments provided here: https://github.com/aqlaboratory/openfold/tree/main/tests/test_data/alignments.

My running scripts:

python run_pretrained_openfold.py tests/test_data/2q2k/ data/pdb_mmcif/mmcif_files/ --uniref90_database_path data/uniref90/uniref90.fasta --mgnify_database_path data/mgniy/mgy_clusters_2022_05.fa --pdb_seqres_database_path data/pdb_seqres/pdb_seqres.txt --uniref30_database_path data/uniref30/UniRef30_2021_03 --uniprot_database_path data/uniprot/uniprot.fasta --bfd_database_path databfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --jackhmmer_binary_path $CONDA_PREFIX/bin/jackhmmer --hhblits_binary_path $CONDA_PREFIX/bin/hhblits --hmmsearch_binary_path $CONDA_PREFIX/bin/hmmsearch --hmbuild_binary_path $CONDA_PREFIX/bin/hmmbuild --kalign_binary_path $CONDA_PREFIX/bin/kalign --config_preset "model_1_multimer_v3" --model_device "cuda:0" --use_precomputed_alignments tests/test_data/2q2k/out/alignments/ --output_dir tests/test_data/2q2k/out

Hi @wtq18

In your command I couldn't find any path pointing to either AlphaFold's neural network weight or your own pre-retrained OpenFold Multimer checkpoint file. I'm afraid you basically have modelled using a randomly initialised neural net, which naturally won't yield any good results.

Yours Dingquan

wtq18 commented 7 months ago

According to the script, I'm using this weight openfold/resources/params/params_model_1_multimer_v3.npz. Or can you provide the prediction result in the example folder? if args.jax_param_path is None and args.openfold_checkpoint_path is None: args.jax_parampath = os.path.join( "openfold", "resources", "params", "params" + args.config_preset + ".npz" )