General Soluble Model Question

SkwisgaarSkwigelf commented 1 year ago

I have a small, monomeric beta-barrel transmembrane protein that I would like to redesign for soluble expression. I ran ProteinMPNN twice, using either the standard model or the soluble model (--use_soluble_model flag) as input (generating 20 sequences each, T=0.1).

The parental sequence has ~53% hydrophobic amino acids. After running the standard model, the output sequences range from 53-57% hydrophobic. Using the soluble model the output sequences range from 54-59% hydrophobic, so they're more hydrophobic than the parental sequence. Alphafold agrees that the top scoring sequences will fold into the correct structure, however I can still see large hydrophobic patches on the surface of the soluble model generated structure in the transmembrane region.

I haven't tried expressing this protein yet, but it definitely does not look soluble to me. I will try biasing against hydrophobic residues next, however I'm surprised that the soluble model didn't automatically redesign the transmembrane region to be less hydrophobic. Has anybody had similar experiences, or know of any other tricks for redesigning membrane proteins for soluble expression? Has anybody previously optimized amino-acid bias values for soluble expression (otherwise I'll just guess bias values).

LiorZ commented 1 year ago

AFAIK from reading the soluble analogues of integral membrane protein before using the soluble version of protein-mpnn they used AF2seq (https://github.com/bene837/af2seq) to generate the topologies. Have you tried that before running MPNN?

amin-sagar commented 1 year ago

I have had a similar experience. It would be great to know what sampling temperature was used in the original paper.

dauparas commented 1 year ago

We have updated solubleMPNN model weights. Try using the latest version and let us know if there are still hydrophobic AAs on the surface.

SkwisgaarSkwigelf commented 1 year ago

Thank you, I tried with the new weights (along with using AF2seq to change the starting sequence of my input) and my outputs now appear much more soluble (fewer surface hydrophobics and better scores than manually biasing against hydrophobics).

dauparas / ProteinMPNN

General Soluble Model Question #47