google-deepmind / alphafold

Open source code for AlphaFold 2.
Apache License 2.0
12.91k stars 2.29k forks source link

disable HHsearch/HHblits when using precomputed msas #856

Open dstern opened 1 year ago

dstern commented 1 year ago

I use precomputed msas (--use_precomputed_msas) for my work and af2 currently spends most of its time performing HHSearch and HHblits, even though I know it won't find anything. Is there any flag to turn off these searches. This ends up wasting a lot of resources for my large searches.

tcoates5 commented 1 year ago

Do you have precomputed msa files at the target location for AlphaFold? If there are no files there, AF ignores the --use_precomputed_msas flag

dstern commented 1 year ago

Yes, all required msas in correct folder. af2 finishes and produces the structures. It is simply wasting most compute time searching HH databases for sequences that I know don't exist there. I would love to turn off this search and save some $$.

FJ0M commented 7 months ago

I've been struggling with using the use_precomputed_msas flag too. Despite me making the directory before running AlphaFold. {$Fasta_name}/msas/{<.a3m file goes here>} It seems to just bypass it and go on do run its own msa? This is the first time I'm trying to run with pre-computed msas so I'm probably doing it wrong. image

dstern commented 7 months ago

You need to provide multiple alignment files in the msas folder, with the correct names: bfd_uniclust_hits.a3m bfd_uniclust_hits.sto mgnify_hits.sto uniref90_hits.sto

I first make the a3m file and then reformat this file into the other three files.

I have attached a shell script that takes several inputs and produces a folder with all the required files in the correct places.

fa_in_af2ready_out.sh.zip

The inputs are 1 - the protein name (precisely as found in item 3) 2 - a gff file output from signalp, to allow removal of N-terminal signal peptides 3 - a fasta file that contains the protein sequence you want to model 4 - a fasta file that you want to use for phmmer search, to generate the msa

The script requires phmmer, seqkit, and mafft, and also uses the following scripts:

secreteda3m.py.zip

reformat.pl.zip

fix_reformat_sto.py.zip

I hope this helps you.

FJ0M commented 7 months ago

Thank you so much, I'll update with how I get on.

I had assumed AlphaFold would take any .a3m input it found.