PaddlePaddle / PaddleHelix

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集
Other
1.02k stars 225 forks source link

Error in HelixFold3: TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 3jxv_A. Residue range: 11-115 #351

Closed ggokturkk closed 2 weeks ago

ggokturkk commented 1 month ago

Hello, I have encountered this error while trying the HelixFold3 app. You can see my JSON file below:

{
    "entities": [
        {
            "type": "protein",
            "sequence": "GVQVETISPGDGRTFPKRGQTCVVHYTGMLEDGKKFDSSRDRNKPFKFMLGKQEVIRGWEEGVAQMSVGQRAKLTISPDYAYGATGHPGIIPPHATLVFDVELLKLE",
            "count": 1
        }
    ]
}
2024-09-26 13:13:30 INFO Found an exact template match 3jxv_A.
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 798, in _process_single_hit
    features, realign_warning = _extract_template_features(
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 629, in _extract_template_features
    raise TemplateAtomMaskAllZerosError(
helixfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 3jxv_A. Residue range: 11-115

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 387, in process_chain_msa
    raw_features = data_pipeline._process_single_chain(
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/pipeline_multimer_parallel.py", line 213, in _process_single_chain
    chain_features = self._monomer_data_pipeline.process(
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/pipeline_parallel.py", line 271, in process
    templates_result = self.template_featurizer.get_templates(
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 957, in get_templates
    result = _process_single_hit(
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 820, in _process_single_hit
    warning = ('%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: '
TypeError: must be real number, not NoneType
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 483, in process_input_json
    _, raw_features, type_chain_id, seqs = future.result()
  File "/root/miniconda3/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/root/miniconda3/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
TypeError: must be real number, not NoneType
2024-09-26 13:13:30 ERROR Task generated an exception : must be real number, not NoneType
2024-09-26 13:23:57 INFO Finished Jackhmmer (uniprot.fasta) query in 634.199 seconds
[MSA/Template] protein_B; seq length: 104; use: 1252.6531014442444
2024-09-26 13:23:57 INFO [Multiprocess] All msa/template use: 1253.4603350162506
Traceback (most recent call last):
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/inference.py", line 637, in <module>
    main(args)
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/inference.py", line 496, in main
    feature_dict = feature_processing_aa.process_input_json(
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 499, in process_input_json
    all_feats = add_assembly_features(all_chain_features, ccd_preprocessed_dict, no_msa_templ_feats)
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py", line 303, in add_assembly_features
    hf2_msa_feats = pipeline_multimer.process_with_all_chain_features(chain_group_feats)
  File "/app/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/pipeline_multimer.py", line 121, in process_with_all_chain_features
    input_seqs.add(str(chain_features["sequence"]))
KeyError: 'sequence'

Could someone help me resolve this issue?

Thanks in advance!

All the best, Gokhan

jscgh commented 1 month ago

I have encountered the same error with another protein.

{
    "entities": [
        {
            "type": "protein",
            "sequence": "MKFQHTFIALLSLLTYANAYDYFTTTLANQNPVCASVDVIQNVCTEVCGRFVRYIPDATNTNQFTFAEYTTNQCTVQVTPAVTNTFTCADQTSSHALGSDWSGVCKITATPAPTVTPTVTPTVTPTVTPTPTNTPNPTPSQTSTTTGSASTVVASLSLIIFSMILSLC",
            "count": 1
        }
    ]
}
2024-10-01 16:59:22 DEBUG Reading PDB entry from /mnt/af2/pdb_mmcif/mmcif_files/4l3a.cif. Query: MKFQHTFIALLSLLTYANAYDYFTTTLANQNPVCASVDVIQNVCTEVCGRFVRYIPDATNTNQFTFAEYTTNQCTVQVTPAVTNTFTCADQTSSHALGSDWSGVCKITATPAPTVTPTVTPTVTPTVTPTPTNTPNPTPSQTSTTTGSASTVVASLSLIIFSMILSLC, template: DLSKPGKYVVTLNAENDLQKALPVQVMVIVEKETPIPDPTPTPTPDPTPTPDPSPTPNPVINPN
2024-10-01 16:59:22 INFO Found an exact template match 4l3a_A.
2024-10-01 16:59:22 WARNING Template structure not in release dates dict: 4l3a
2024-10-01 16:59:22 DEBUG Reading PDB entry from /mnt/af2/pdb_mmcif/mmcif_files/4l3a.cif. Query: MKFQHTFIALLSLLTYANAYDYFTTTLANQNPVCASVDVIQNVCTEVCGRFVRYIPDATNTNQFTFAEYTTNQCTVQVTPAVTNTFTCADQTSSHALGSDWSGVCKITATPAPTVTPTVTPTVTPTVTPTPTNTPNPTPSQTSTTTGSASTVVASLSLIIFSMILSLC, template: DLSKPGKYVVTLNAENDLQKALPVQVMVIVEKETPIPDPTPTPTPDPTPTPDPSPTPNPVINPN
2024-10-01 16:59:22 INFO Found an exact template match 4l3a_B.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/apptainers/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 798, in _process_single_hit
    features, realign_warning = _extract_template_features(
  File "/apptainers/PaddleHelix/apps/protein_folding/helixfold3/helixfold/data/templates.py", line 629, in _extract_template_features
    raise TemplateAtomMaskAllZerosError(
helixfold.data.templates.TemplateAtomMaskAllZerosError: Template all atom mask was all zeros: 4l3a_B. Residue range: 482-545
leaves520 commented 2 weeks ago

@ggokturkk @jscgh Hi, all, this errors caused by missing mmcif structure coordinates and syntax during template feature extraction. we have fixed it. Please check: https://github.com/PaddlePaddle/PaddleHelix/pull/357