RosettaCommons / RoseTTAFold

This package contains deep learning models and related scripts for RoseTTAFold
MIT License
1.98k stars 436 forks source link

search_msa database sequencing #110

Closed shawncal closed 2 years ago

shawncal commented 2 years ago

Sequencing the MSA search, first looking through UniRef (x4 times), then expanding to bfd (x4 times). Since these database files are very large, this improves the chance that we can keep the UniRef db in cache rather than loading (or steaming, if pulling from a network resource) 4 separate times.

For sequences that are well-represented in the UniRef, we may never need to load/search bfd, which will speed things up significantly.

Other, minor changes:

BJWiley233 commented 1 year ago

Did you find structures were as good without searching BFD? AlphaFold does the same as before running against both databases at the same time, maybe can adjust and run similar as to Uniref first, then BFD. I guess they don't mind if they have way more than 2000 proteins at >=75% coverage??