How to accelerate protein prediction

dahaigui commented 2 years ago

We have built a mutant library for cucumber and have completed resequencing some of the individual plants, finding a number of proteins with non-synonymous mutations. We have translated these proteins and hope to predict the protein structure using AlphaFold2. These proteins have minimal differences from the original protein sequences, is it possible to reduce the MSA time to speed up the protein structure prediction?

tcoates5 commented 2 years ago

This is addressed in the "Inferencing many proteins" section of the README. The short answer is not exactly, but you can reduce the runtime for similar proteins by keeping the network a fixed size, and a bulk inference script could be built on the RunModel.predict method.

smturzo commented 2 years ago

This is good to know. Could you please elaborate with an example please? Or point me to a resource where I can find more about this?

dahaigui commented 2 years ago

@smturzo

This is good to know. Could you please elaborate with an example please? Or point me to a resource where I can find more about this?

The reference protein seq is >CsaV3_7G025510 MIGRLRMNHCVPDFEMADDFSLPTFSSLTRPRKSSLPDDDVMELLWQNGQVVTHSQNQRSFRKSPPSKFDVSIPQEQAATREIRPSTQLEEHHELFMQEDEMASWLNYPLVEDHNFCSDLLFPAITAPLCANPQPDIRPSATATLTLTPRPPIPPCRRPEVQTSVQFSRNKATVESEPSNSKVMVRESTVVDSCDTPSVGPESRASEMARRKLVEVVNGGGVRYEIARGSDGVRGASVGGDGIGEKEMMTCEMTVTSSPGGSSASAEPACPKLAVDDRKRKGRALDDTECQSEDVEYESADPKKQLRGSTSTKRSRAAEVHNLSERRRRDRINEKMKALQELIPRCNKTDKASMLDEAIEYLKTLQLQVQMMSMGCGMMPMMFPGVQQYLPPPMGMGMGMGMEMGMNRPMMQFHNLLAGSNLPMQAGATAAAHLGPRFPLPPFAMPPVPGNDPSRAQAMNNQPDPMANSVGTQNTTPPSVLGFPDSYQQFLSSTQMQFHMTQALQNQHPVQLNTSRPCTSRGPENRDNHQSG

One of the proteins in our mutant library differs from the reference genome by only one amino acid, with the following sequence >Csa_Mutant_125_7G025510 MIGRLRMNHCVPDFEMADDFSLPTFSSLTRPRKSSLPDDDVMELLWQNGQVVTHSQNQRSFRKSPPSKFDVSIPQEQAATREIRPSTQLEEHHELFMQEDEMASWLNYPLVEDHNFCSDLLFPAITAPLCANPQPDIRPSATATLTLTPRPPIPPCRRPEVQTSVQFSRNKATVESEPSNSKVMVRESTVVDSCDTPSVGPESRASEMARRKLVEVVNGGGVRYEIARGSDGVRGASVGGDGIGEKEMMTCEMTVTSSPGGSSASAEPACPKLAVDDRKRKGRALDDTECQSEDVEYESADPKKQLRGSTSTKRSRAAEVHNLSERRRRDRINEKMKALQELIPRCNKTDKASMLDEAIEYLKTLQLQVQMMSMGCGMMPMMFPGVQQYLPPPMGMGMGMGMEMGMNRPMMQFHNLLAGSNLKMQAGATAAAHLGPRFPLPPFAMPPVPGNDPSRAQAMNNQPDPMANSVGTQNTTPPSVLGFPDSYQQFLSSTQMQFHMTQALQNQHPVQLNTSRPCTSRGPENRDNHQSG

google-deepmind / alphafold

How to accelerate protein prediction #631