google-deepmind / alphafold

Open source code for AlphaFold.
Apache License 2.0
12.35k stars 2.21k forks source link

disable HHsearch/HHblits when using precomputed msas #856

Open dstern opened 10 months ago

dstern commented 10 months ago

I use precomputed msas (--use_precomputed_msas) for my work and af2 currently spends most of its time performing HHSearch and HHblits, even though I know it won't find anything. Is there any flag to turn off these searches. This ends up wasting a lot of resources for my large searches.

tcoates5 commented 10 months ago

Do you have precomputed msa files at the target location for AlphaFold? If there are no files there, AF ignores the --use_precomputed_msas flag

dstern commented 10 months ago

Yes, all required msas in correct folder. af2 finishes and produces the structures. It is simply wasting most compute time searching HH databases for sequences that I know don't exist there. I would love to turn off this search and save some $$.

FJ0M commented 5 months ago

I've been struggling with using the use_precomputed_msas flag too. Despite me making the directory before running AlphaFold. {$Fasta_name}/msas/{<.a3m file goes here>} It seems to just bypass it and go on do run its own msa? This is the first time I'm trying to run with pre-computed msas so I'm probably doing it wrong. image

dstern commented 5 months ago

You need to provide multiple alignment files in the msas folder, with the correct names: bfd_uniclust_hits.a3m bfd_uniclust_hits.sto mgnify_hits.sto uniref90_hits.sto

I first make the a3m file and then reformat this file into the other three files.

I have attached a shell script that takes several inputs and produces a folder with all the required files in the correct places.

fa_in_af2ready_out.sh.zip

The inputs are 1 - the protein name (precisely as found in item 3) 2 - a gff file output from signalp, to allow removal of N-terminal signal peptides 3 - a fasta file that contains the protein sequence you want to model 4 - a fasta file that you want to use for phmmer search, to generate the msa

The script requires phmmer, seqkit, and mafft, and also uses the following scripts:

secreteda3m.py.zip

reformat.pl.zip

fix_reformat_sto.py.zip

I hope this helps you.

FJ0M commented 5 months ago

Thank you so much, I'll update with how I get on.

I had assumed AlphaFold would take any .a3m input it found.