MMseqs2 vs jackhmmer - performance difference

divnori commented 1 week ago

Thank you for the code release!

I have a question about performance using precomputed MSAs (with MMseqs2) vs on-the-fly MSA computation with jackhmmer. I am comparing performance between (i) passing in precomputed unpaired MSAs from colabfold both with and without template search (no paired MSA) and (ii) letting AF3 compute everything (unpaired MSA, paired MSA, templates).

While the results look similar upon visual inspection, I am seeing much higher ipTM and ranking scores with jackhmmer. Is this expected? Have others observed a similar effect? I see some discussion about this in a previous issue but not sure whether people have further information now.

Appreciate your help - thanks.

Augustin-Zidek commented 1 week ago

AlphaFold 3 was trained, calibrated, and evaluated on MSAs built by Jackhmmer and Nhmmer.

As such, using MSA built by other tools (like MMSeqs2, HHblits, BLAST, ...) might lead to different results with potentially different confidences and/or accuracy, and potentially less correlation between confidence and accuracy.

We recommend using Jackhmmer/Nhmmer for best accuracy, especially in cases where the MSA is shallow.

sky1ove commented 7 hours ago

Actually I observed that I used Colabfold MSA pipeline (MMSeqs2) and without template search, gives a much much higher iPTM than the Alphafold server result.

I need to check 1) whether it has something to do with template search, 2) rerun with the default AF3 pipeline and see the correlation.

google-deepmind / alphafold3

MMseqs2 vs jackhmmer - performance difference #125