dauparas / ProteinMPNN

Code for the ProteinMPNN paper
MIT License
1.05k stars 307 forks source link

Creates hydrophobic surface patches wit many Ala side chains #95

Open gha2012 opened 9 months ago

gha2012 commented 9 months ago

Hi, thank you very much for making this available! I am using this together with RFDiffusion to create small protein complexes and the interfaces look very good in many cases. However, I found that ProteinMPNN often creates very hydrophobic polyAla surface patches. I am a bit worried that this will lead to solubility issues. Is there a parameter to control this? Thanks for any suggestions!

image

drewschaub commented 8 months ago

I'm not one of the developers.

Have you looked at https://github.com/nrbennet/dl_binder_design The binder design protocol might be what you're looking for. It couples proteinmpnn with AF2 and predicts solubility and binding affinity using AF2 scores.

If running proteinmpnn by itself, I'll run a few jobs using different models and different values for T to generate several sequences. I'll then filter out sequences with high counts of alanine. I'll also calculate pI values as the models have a tendency to generate a lot of charged residues (e.g. glutamic acid).

The issue of repeats isn't something unique to ProteinMPNN, I notice it when running ESM.

It's also not unique to this domain. If I use OpenAI's whisper to transcribe audio it's common for it to generate repeats there as well.

gha2012 commented 8 months ago

Thank you for your comment! Yes, I am using the binder design protocol. I guess I should have posted the question there but I thought this is related to proteinmpnn.