elaspic / elaspic2

Predicting the effect of mutations on protein stability and protein binding affinity using pretrained neural networks and a ranking objective function.
https://gitlab.com/elaspic/elaspic2
MIT License
19 stars 6 forks source link

Do ProteinSolver and Elaspic support multiple point mutations? #2

Closed Ribosome25 closed 3 years ago

Ribosome25 commented 3 years ago

Hi Dear Alexey,

I am looking for a method to embed mutated protein sequences to vectors. I read through ELASPIC2 Affinity Demo notebook and especially model.analyze_mutation(mut, protein_affinity_features) part, but couldn't find how to pass multiple mutations at one time.

For example, what I wanted to analyse is (wt) TNLCPF -> TNARPF, and I want to pass two mutations (L3A and C4R) into it.

Do you support multiple points mutations at this time? Thanks.

ostrokach commented 3 years ago

I have created a notebook where I suggested a way on how to do this: https://colab.research.google.com/github/elaspic/elaspic2/blob/master/notebooks/10_multiresidue_demo.ipynb.

The notebook includes a short summary, which I also provide below:


While ELASPIC2 was not explicitly trained to predict the effect of multiple mutations, we believe that reasonable accuracy can be achieved by using the following strategy. Say we wish to mutate protein ADEK to protein GEDR. We could instead evaluate the effect of the following four mutations, and then combine the results:

  1. ADEKGDEK
  2. GDEKGEEK
  3. GEEKGEDK
  4. GEDKGEDR

Furthermore, both ProtBert and ProteinSolver can be natively used to obtain a probability score for the wildtype and the mutant sequence, and the difference between those probabilities should correlate with the stability of the protein [1].

This notebook demonstrates how to use both of the strategies described above to calculate the change in stability between a wildtype and a mutant sequence that differ by more than one amino acid.


Please let me know if the suggested protocol does not work or if you notice some bugs or other issues.

ostrokach commented 3 years ago

FYI, I have updated the 10_multiresidue_demo.ipynb notebook. I think the generation of global ProtBert and ProteinSolver score changes should work better now.