aws-solutions-library-samples / aws-batch-arch-for-protein-folding

Apache License 2.0
73 stars 27 forks source link

OpenFold with MMSeqs2 #4

Closed ssrb19 closed 1 year ago

ssrb19 commented 1 year ago

Although OpenFold supports MSA generation by MMSeqs2, it appears that BatchFold has no provision to use it. Could you please add support for it?

brianloyal commented 1 year ago

Hi @ssrb19 , thanks for your question! We did some testing of MMSeqs2 last year and found that it isn't a good fit for batch execution. When you run analysis with MMSeqs2 via tools like ColabFold, it submits jobs to a web server hosted by an academic team. This server keeps the indexed reference database in memory, which allows it to respond to requests very quickly. However, when running analyses in batch, the time it takes to load and index the reference data on a new instance eliminates much of the speed advantages of MMSeqs2 over JackHMMER/hhbilts. This is why we made the decision against including MMSeqs in our architecture. I hope this helps!