Dear all,
I had trouble running a prediction with updated pdb_seqres.txt files since some entries contain unusual DNA residue names, PDB code 7ooo, 7oos and 7ozz. These nucleic acids are modified residues but do not follow DNA alphabet, so the parser fails with an error on the letter "0" (zero)
Traceback here and details below:
Traceback (most recent call last):
File "/app/alphafold/run_alphafold.py", line 422, in
app.run(main)
File "/opt/alphafoldenv/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/alphafoldenv/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/app/alphafold/run_alphafold.py", line 398, in main
predict_structure(
File "/app/alphafold/run_alphafold.py", line 172, in predict_structure
feature_dict = data_pipeline.process(
File "/app/alphafold/alphafold/data/pipeline_multimer.py", line 264, in process
chain_features = self._process_single_chain(
File "/app/alphafold/alphafold/data/pipeline_multimer.py", line 212, in _process_single_chain
chain_features = self._monomer_data_pipeline.process(
File "/app/alphafold/alphafold/data/pipeline.py", line 185, in process
pdb_templates_result = self.template_searcher.query(msa_for_templates)
File "/app/alphafold/alphafold/data/tools/hmmsearch.py", line 79, in query
return self.query_with_hmm(hmm)
File "/app/alphafold/alphafold/data/tools/hmmsearch.py", line 112, in query_with_hmm
raise RuntimeError(
RuntimeError: hmmsearch failed:
stdout:
hmmsearch :: search profile(s) against a sequence database
7ooo_D mol:na length:10 RNA (5'-R(CPAPAPAPGPAPAPAPAPG)-3')
CAAAGAAAAG
7ooo_B mol:na length:11 DNA (5'-D(CPTP(RWQ)PTPCPTPTPTPG)-3')
-CT05ATCTTTG
+CTATCTTTG
7ooo_E mol:na length:11 DNA (5'-D(CPTP(RWQ)PTPCPTPTPTPG)-3')
-CT05ATCTTTG
+CTATCTTTG
7oop_A mol:protein length:1970 DNA-directed RNA polymerase II subunit RPB1
MHGGGPPSGDSACPLRTIKRVQFGVLSPDELKRMSVTEGGIKYPETTEGGRPKLGGLMDPRQGVIERTGRCQTCAGNMTECPGHFGHIELAKPVFHVGFLVKTMKVLRCVCFFCSKLLVDSNNPKIKDILAKSKGQPKKRLTHVYDLCKGKNICEGGEEMDNKFGVEQPEGDEDLTKEKGHGGCGRYQPRIRRSGLELYAEWKHVNEDSQEKKILLSPERVHEIFKRISDEECFVLGMEPRYARPEWMIVTVLPVPPLSVRPAVVMQGSARNQDDLTHKLADIVKINNQLRRNEQNGAAAHVIAEDVKLLQFHVATMVDNELPGLPRAMQKSGRPLKSLKQRLKGKEGRVRGNLMGKRVDFSARTVITPDPNLSIDQVGVPRSIAANMTFAEIVTPFNIDRLQELVRRGNSQYPGAKYIIRDNGDRIDLRFHPKPSDLHLQTGYKVERHMCDGDIVIFNRQPTLHKMSMMGHRVRILPWSTFRLNLSVTTPYNADFDGDEMNLHLPQSLETRAEIQELAMVPRMIVTPQSNRPVMGIVQDTLTAVRKFTKRDVFLERGEVMNLLMFLSTWDGKVPQPAILKPRPLWTGKQIFSLIIPGHINCIRTHSTHPDDEDSGPYKHISPGDTKVVVENGELIMGILCKKSLGTSAGSLVHISYLEMGHDITRLFYSNIQTVINNWLLIEGHTIGIGDSIADSKTYQDIQNTIKKAKQDVIEVIEKAHNNELEPTPGNTLRQTFENQVNRILNDARDKTGSSAQKSLSEYNNFKSMVVSGAKGSKINISQVIAVVGQQNVEGKRIPFGFKHRTLPHFIKDDYGPESRGFVENSYLAGLTPTEFFFHAMGGREGLIDTAVKTAETGYIQRRLIKSMESVMVKYDATVRNSINQVVQLRYGEDGLAGESVEFQNLATLKPSNKAFEKKFRFDYTNERALRRTLQEDLVKDVLSNAHIQNELEREFERMREDREVLRVIFPTGDSKVVLPCNLLRMIWNAQKIFHINPRLPSDLHPIKVVEGVKELSKKLVIVNGDDPLSRQAQENATLLFNIHLRSTLCSRRMAEEFRLSGEAFDWLLGEIESKFNQAIAHPGEMVGALAAQSLGEPATQMTLNTFHYAGVSAKNVTLGVPRLKELINISKKPKTPSLTVFLLGQSARDAERAKDILCRLEHTTLRKVTANTAIYYDPNPQSTVVAEDQEWVNVYYEMPDFDVARISPWLLRVELDRKHMTDRKLTMEQIAEKINAGFGDDLNCIFNDDNAEKLVLRIRIMNSDENKMQEEEEVVDKMDDDVFLRCIESNMLTDMTLQGIEQISKVYMHLPQTDNKKKIIITEDGEFKALQEWILETDGVSLMRVLSEKDVDPVRTTSNDIVEIFTVLGIEAVRKALERELYHVISFDGSYVNYRHLALLCDTMTCRGHLMAITRHGVNRQDTGPLMKCSFEETVDVLMEAAAHGESDPMKGVSENIMLGQLAPAGTGCFDLLLDAEKCKYGMEIPTNIPGLGAAGPTGMFFGSAPSPMGGISPAMTPWNQGATPAYGAWSPSVGSGMTPGAAGFSPSAASDASGFSPGYSPAWSPTPGSPGSPGPSSPYIPSPGGAMSPSYSPTSPAYEPRSPGGYTPQSPSYSPTSPSYSPTSPSYSPTSPNYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPSYSPTSPNYSPTSPNYTPTSPSYSPTSPSYSPTSPNYTPTSPNYSPTSPSYSPTSPSYSPTSPSYSPSSPRYTPQSPTYTPSSPSYSPSSPSYSPTSPKYTPTSPSYSPSSPEYTPTSPKYSPTSPKYSPTSPKYSPTSPTYSPTTPKYSPTSPTYSPTSPVYTPTSPKYSPTSPTYSPTSPKYSPTSPTYSPTSPKGSTYSPTSPGYSPTSPTYSLTSPAISPDDSDEEN
7oop_J mol:protein length:67 DNA-directed RNA polymerases I, II, and III subunit RPABC5
@@ -1360717,7 +1360717,7 @@ MWKDKEFQVLFVLTILTLISGTIFYSTVEGLRPIDALYFS
7oos_A mol:na length:10 RNA (5'-R(CPAPAPAPGPAPAPAPAPG)-3')
CAAAGAAAAG
7oos_B mol:na length:11 DNA (5'-D(CPTP(RWT)PTPCPTPTPTPG)-3')
-CT05KTCTTTG
+CTTCTTTG
7oot_A mol:protein length:141 Interferon regulatory factor 4
MGSHHHHHHSAALEVLFQGPGGNGKLRQWLIDQIDSGKYPGLVWENEEKSIFRIPWKHAGKQDYNREEDAALFKAWALFKGKFREGIDKPDPPTWKTRLRCALNKSNDFEELVERSQLDISDPYKVYRIVPEGAKKGAKQL
7oot_B mol:protein length:141 Interferon regulatory factor 4
@@ -1364753,7 +1364753,7 @@ GSHMEYELPEDPKWEFPRDKLTLGKPLGEGCFGQVVMAEA
7ozz_A mol:na length:10 RNA (5'-R(CPAPAPAPGPAPAPAPAPG)-3')
CAAAGAAAAG
7ozz_B mol:na length:11 DNA (5'-D(CPTP(RWR)PTPCPTPTPTPG)-3')
-CT05HTCTTTG
+CTTCTTTG
7p00_H mol:protein length:298 Antibody fragment scFv16
MKFLVNVALVFMVVYISYIYADYKDDDDKHHHHHHHHHHLEVLFQGPDVQLVESGGGLVQPGGSRKLSCSASGFAFSSFGMHWVRQAPEKGLEWVAYISSGSGTIYYADTVKGRFTISRDDPKNTLFLQMTSLRSEDTAMYYCVRSIYYYGSSPFDFWGQGTTLTVSSGGGGSGGGGSGGGGSDIVMTQATSSVPVTPGESVSISCRSSKSLLHSNGNTYLYWFLQRPGQSPQLLIYRMSNLASGVPDRFSGSGSGTAFTLTISRLEAEDVGVYYCMQHLEYPLTFGAGTKLELKAAA
7p00_B mol:protein length:354 Guanine nucleotide-binding protein G(I)/G(S)/G(T) subunit beta-1
I do not think this error belongs to HHMsearch (the parse failed error), but to AlphaFold.
May be an exception should be triggered, but not halt the whole process ?
Thanks a lot to your time, I'll report to HMMsearch too (linking this issue).
Dear all, I had trouble running a prediction with updated pdb_seqres.txt files since some entries contain unusual DNA residue names, PDB code 7ooo, 7oos and 7ozz. These nucleic acids are modified residues but do not follow DNA alphabet, so the parser fails with an error on the letter "0" (zero)
Traceback here and details below:
Traceback (most recent call last): File "/app/alphafold/run_alphafold.py", line 422, in
app.run(main)
File "/opt/alphafoldenv/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/alphafoldenv/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/app/alphafold/run_alphafold.py", line 398, in main
predict_structure(
File "/app/alphafold/run_alphafold.py", line 172, in predict_structure
feature_dict = data_pipeline.process(
File "/app/alphafold/alphafold/data/pipeline_multimer.py", line 264, in process
chain_features = self._process_single_chain(
File "/app/alphafold/alphafold/data/pipeline_multimer.py", line 212, in _process_single_chain
chain_features = self._monomer_data_pipeline.process(
File "/app/alphafold/alphafold/data/pipeline.py", line 185, in process
pdb_templates_result = self.template_searcher.query(msa_for_templates)
File "/app/alphafold/alphafold/data/tools/hmmsearch.py", line 79, in query
return self.query_with_hmm(hmm)
File "/app/alphafold/alphafold/data/tools/hmmsearch.py", line 112, in query_with_hmm
raise RuntimeError(
RuntimeError: hmmsearch failed:
stdout:
hmmsearch :: search profile(s) against a sequence database
HMMER 3.3.2 (Nov 2020); http://hmmer.org/
Copyright (C) 2020 Howard Hughes Medical Institute.
Freely distributed under the BSD open source license.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
query HMM file: /tmp/tmp2i0w1r3m/query.hmm
target sequence database: /scratch/shared/dataset/alphafold_data/pdb_seqres/pdb_seqres.txt
MSA of all hits saved to file: /tmp/tmp2i0w1r3m/output.sto
show alignments in output: no
sequence reporting threshold: E-value <= 100
domain reporting threshold: E-value <= 100
sequence inclusion threshold: E-value <= 100
domain inclusion threshold: E-value <= 100
MSV filter P threshold: <= 0.1
Vit filter P threshold: <= 0.1
Fwd filter P threshold: <= 0.1
number of worker threads: 8
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: query [M=242]
stderr: Parse failed (sequence file /scratch/shared/dataset/alphafold_data/pdb_seqres/pdb_seqres.txt): Line 1364756: illegal character 0
After manually editing the file to remove the "05H" character (the modified DNA nucleotide) the error is gone. Here is a full diff:
diff -Naup pdb_seqres/pdb_seqres.txt-orig pdb_seqres/pdb_seqres.txt --- pdb_seqres/pdb_seqres.txt-orig 2022-09-13 00:19:53.000000000 +0200 +++ pdb_seqres/pdb_seqres.txt 2022-09-13 00:36:37.000000000 +0200 @@ -1360655,9 +1360655,9 @@ CAAAGAAAAG
I do not think this error belongs to HHMsearch (the parse failed error), but to AlphaFold. May be an exception should be triggered, but not halt the whole process ?
Thanks a lot to your time, I'll report to HMMsearch too (linking this issue).