google-deepmind / alphafold

Open source code for AlphaFold 2.
Apache License 2.0
12.84k stars 2.28k forks source link

HHSearch fails on large predicted protein #522

Open GeoMicroSoares opened 2 years ago

GeoMicroSoares commented 2 years ago

Hi there,

Having set up alphafold in my machine (NVIDIA GeForce RTX 2080 GPU, ), which I've tested successfully, I'm now coming across the error below running it on a 5586 AA predicted protein (from a metagenome). I'm not sure what might be going wrong here as this seems like an internal HHSearch error. I've tested this with the full and reduced alphafold databases on default parameters (just adding the output directory). Thanks in advance for any help.

I0629 09:36:46.062932 139841260160832 run_docker.py:255] I0629 07:36:46.061630 139910991673152 utils.py:36] Started HHsearch query
I0629 09:36:46.688335 139841260160832 run_docker.py:255] I0629 07:36:46.687293 139910991673152 utils.py:40] Finished HHsearch query in 0.625 seconds
I0629 09:36:46.707670 139841260160832 run_docker.py:255] Traceback (most recent call last):
I0629 09:36:46.707909 139841260160832 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 422, in <module>
I0629 09:36:46.708083 139841260160832 run_docker.py:255] app.run(main)
I0629 09:36:46.708291 139841260160832 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0629 09:36:46.708447 139841260160832 run_docker.py:255] _run_main(main, args)
I0629 09:36:46.708595 139841260160832 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0629 09:36:46.708742 139841260160832 run_docker.py:255] sys.exit(main(argv))
I0629 09:36:46.708886 139841260160832 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 406, in main
I0629 09:36:46.709030 139841260160832 run_docker.py:255] random_seed=random_seed)
I0629 09:36:46.709173 139841260160832 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 174, in predict_structure
I0629 09:36:46.709334 139841260160832 run_docker.py:255] msa_output_dir=msa_output_dir)
I0629 09:36:46.709480 139841260160832 run_docker.py:255] File "/app/alphafold/alphafold/data/pipeline.py", line 188, in process
I0629 09:36:46.709624 139841260160832 run_docker.py:255] pdb_templates_result = self.template_searcher.query(uniref90_msa_as_a3m)
I0629 09:36:46.709766 139841260160832 run_docker.py:255] File "/app/alphafold/alphafold/data/tools/hhsearch.py", line 96, in query
I0629 09:36:46.709911 139841260160832 run_docker.py:255] stdout.decode('utf-8'), stderr[:100_000].decode('utf-8')))
I0629 09:36:46.710054 139841260160832 run_docker.py:255] RuntimeError: HHSearch failed:
I0629 09:36:46.710198 139841260160832 run_docker.py:255] stdout:
I0629 09:36:46.710341 139841260160832 run_docker.py:255]
I0629 09:36:46.710483 139841260160832 run_docker.py:255]
I0629 09:36:46.710624 139841260160832 run_docker.py:255] stderr:
I0629 09:36:46.710766 139841260160832 run_docker.py:255] - 07:36:46.268 INFO: /tmp/tmpxkcxixk3/query.a3m is in A2M, A3M or FASTA format
I0629 09:36:46.710907 139841260160832 run_docker.py:255]
I0629 09:36:46.711049 139841260160832 run_docker.py:255] - 07:36:46.269 WARNING: Ignoring invalid symbol '*' at pos. 5586 in line 2 of /tmp/tmpxkcxixk3/query.a3m
I0629 09:36:46.711211 139841260160832 run_docker.py:255]
I0629 09:36:46.711354 139841260160832 run_docker.py:255] - 07:36:46.677 ERROR: [subseq from] Uncharacterized protein n=4 Tax=Candidatus Altiarchaeum TaxID=1803512 RepID=A0A1J5JGS9_9ARCH
I0629 09:36:46.711499 139841260160832 run_docker.py:255] - 07:36:46.677 ERROR: Error in /tmp/hh-suite/src/hhalignment.cpp:1244: Compress:
I0629 09:36:46.711654 139841260160832 run_docker.py:255]
I0629 09:36:46.711797 139841260160832 run_docker.py:255] - 07:36:46.677 ERROR:  sequences in /tmp/tmpxkcxixk3/query.a3m do not all have the same number of columns,
I0629 09:36:46.711939 139841260160832 run_docker.py:255]
I0629 09:36:46.712081 139841260160832 run_docker.py:255] - 07:36:46.677 ERROR:
I0629 09:36:46.712221 139841260160832 run_docker.py:255] e.g. first sequence and sequence UniRef90_A0A1J5JGS9/19-5578.
I0629 09:36:46.712363 139841260160832 run_docker.py:255]
I0629 09:36:46.712503 139841260160832 run_docker.py:255] - 07:36:46.677 ERROR: Check input format for '-M a2m' option and consider using '-M first' or '-M 50'
GeoMicroSoares commented 2 years ago

I've figured out my problem and maybe this will help other people - my header was too large and contained a number of symbols that may have interfered with the program. I've trimmed it to the essential and alphafold is now running.

GeoMicroSoares commented 2 years ago

This is actually still going on - it ran further after this but I'm getting an error similar to #499: stdout.decode('utf-8'), stderr[:500_000].decode('utf-8')))

From the stderr:

...
I0701 18:53:09.116339 139859049342784 run_docker.py:255] - 15:23:21.164 INFO: 126000 alignments done
I0701 18:53:09.116392 139859049342784 run_docker.py:255]
I0701 18:53:09.116446 139859049342784 run_docker.py:255] - 15:23:36.637 INFO: 128000 alignments done
I0701 18:53:09.116499 139859049342784 run_docker.py:255]
I0701 18:53:09.116553 139859049342784 run_docker.py:255] - 15:23:36.646 INFO: Stop after DB-HHM: 128000 because early stop  0.219067 < filter cutoff 20
I0701 18:53:09.116606 139859049342784 run_docker.py:255]
I0701 18:53:09.116659 139859049342784 run_docker.py:255] - 15:23:36.646 INFO: Alternative alignment: 1
I0701 18:53:09.116713 139859049342784 run_docker.py:255]
I0701 18:53:09.116766 139859049342784 run_docker.py:255] - 15:31:13.965 INFO: 75936 alignments done
I0701 18:53:09.116820 139859049342784 run_docker.py:255]
I0701 18:53:09.116873 139859049342784 run_docker.py:255] - 15:31:14.181 INFO: Alternative alignment: 2
I0701 18:53:09.116926 139859049342784 run_docker.py:255]
I0701 18:53:09.116980 139859049342784 run_docker.py:255] - 15:36:36.646 INFO: 52008 alignments done
I0701 18:53:09.117033 139859049342784 run_docker.py:255]
I0701 18:53:09.117087 139859049342784 run_docker.py:255] - 15:36:36.735 INFO: Alternative alignment: 3
I0701 18:53:09.117140 139859049342784 run_docker.py:255]
I0701 18:53:09.117193 139859049342784 run_docker.py:255] - 15:40:59.329 INFO: 42575 alignments done
I0701 18:53:09.117251 139859049342784 run_docker.py:255]
I0701 18:53:09.117305 139859049342784 run_docker.py:255] - 15:41:02.288 INFO: Realigning 38704 HMM-HMM alignments using Maximum Accuracy algorithm
I0701 18:53:09.117360 139859049342784 run_docker.py:255]
I0701 18:53:09.117412 139859049342784 run_docker.py:255]
I0701 18:53:09.117465 139859049342784 run_docker.py:255]

Details on the input above. I've redownloaded the databases and have 64GB RAM. Any help?

aputron commented 2 years ago

I'm facing this issue even with a 300 aa protein 😅

shahryary commented 2 years ago

Seems I figured out this problem (see #499 ) by increasing the PC RAM from 64 GB into 256GB, you may try to increase too and see if you could generate the predictions.

amirh-hajianpour commented 8 months ago

I mentioned my solution in #592. Just remove the ending asterisk from your sequence and give it a try!