google-deepmind / alphafold3

AlphaFold 3 inference pipeline.
Other
5.07k stars 563 forks source link

Question: any precomputed hmmer msa database? #56

Closed sky1ove closed 4 days ago

sky1ove commented 6 days ago

To generate MSA (the data pipeline), it takes extremely long time (>15min-20min) on my 32CPU (so n_cpu is 8) for a single protein (around 190 amino acid length). Do you know if this is common? Is there any precomputed hmmer MSA database? Can I use MMseqs2 to precompute the MSA?

smg3d commented 6 days ago

I have used the Colabfold pipeline to generate the MSA, and used that MSA in my input.json for AF3. Quick visual inspection of the predictions look similar to the one obtained from the AF3-MSA pipeline. But I have yet to do an in-depth comparison of both MSAs and their effect on predictions.

James-lin9 commented 5 days ago

I have used the Colabfold pipeline to generate the MSA, and used that MSA in my input.json for AF3. Quick visual inspection of the predictions look similar to the one obtained from the AF3-MSA pipeline. But I have yet to do an in-depth comparison of both MSAs and their effect on predictions.

Hi! Did you specify templates in your input.json (Colabfold msas)?

smg3d commented 5 days ago

Hi! Did you specify templates in your input.json (Colabfold msas)?

I have not used templates so far with an external MSA. So I just specified an empty template list (otherwise it will search for templates) :

    "unpairedMsa": ">seq1\nMYSEQ\n>seq2\nMSASEQ\n>seq3\nMSASEQ ...",
    "pairedMsa": "",
    "templates": []
James-lin9 commented 4 days ago

Hi! Did you specify templates in your input.json (Colabfold msas)?

I have not used templates so far with an external MSA. So I just specified an empty template list (otherwise it will search for templates) :

    "unpairedMsa": ">seq1\nMYSEQ\n>seq2\nMSASEQ\n>seq3\nMSASEQ ...",
    "pairedMsa": "",
    "templates"

> Hi! Did you specify templates in your input.json (Colabfold msas)?

I have not used templates so far with an _external_ MSA. So I just specified an empty template list (otherwise it will search for templates) :
"unpairedMsa": ">seq1\nMYSEQ\n>seq2\nMSASEQ\n>seq3\nMSASEQ ...",
"pairedMsa": "",
"templates": []

Thanks for the reply! Alphafold3 seems doesn't support template search on external MSA. Not sure how it will affect prediction results. I am running 100 predictions on both methods (colabfold and jackhmmer) for comparison. hope can get similar results.

Augustin-Zidek commented 4 days ago

To generate MSA (the data pipeline), it takes extremely long time (>15min-20min) on my 32CPU (so n_cpu is 8) for a single protein (around 190 amino acid length).

Do you know if this is common?

Are your databases on a fast disk? Either an SSD or even better in a RAM-disk.

Is there any precomputed hmmer MSA database?

Sorry, we don't provide precomputed HMMER MSA databases as jackhmmer/nhmmer don't support these. You could set up a hmmpgmd server that allows precomputed MSA database.

Can I use MMseqs2 to precompute the MSA?

Yes, you can, but we have not validated this setup so we can't make any accuracy guarantees about it. I agree with the advice given in the other comments.

Alphafold3 seems doesn't support template search on external MSA. Not sure how it will affect prediction results. I am running 100 predictions on both methods (colabfold and jackhmmer) for comparison. hope can get similar results.

Yes, you have to provide templates yourself in such case. In most cases, especially with deep MSA, it should not matter.