LPDI-EPFL / masif

MaSIF- Molecular surface interaction fingerprints. Geometric deep learning to decipher patterns in molecular surfaces.
Apache License 2.0
582 stars 154 forks source link

Ever considered changing the hydrophobicity scale? #27

Closed jomimc closed 3 years ago

jomimc commented 3 years ago

I was wondering if you have ever considered using a more detailed treatment of hydrophobicity? I was just thinking about how if a tiny fraction of a leucine residue is solvent-accessible, this is then labelled as extremely hydrophobic. However, in this case it is incorrect - the Kyte-Doolittle scale rates leucine this way due to its size and composition, and if you reduce the (effective / solvent-accessible) size then the hydrophobicity should also be reduced.

For example, this one assigns a +1 or -1 depending on both the residue and the atom type: A Simple Atomic-Level Hydrophobicity Scale Reveals Protein Interfacial Structure, Kapcha & Rossky, JMB 2014 https://www.sciencedirect.com/science/article/pii/S0022283613006232?via%3Dihub

Also, I notice that the module "triangulation.computeHydrophobicity" has no means of dealing with non-canonical or modified amino acids. I haven't checked it, but it looks like the code will break if it encounters one. Maybe you should at least use the .get() method to access the kd_scale dictionary. It would be best if there was some way of matching non-canonical / modified amino acids to some hydrophobicity scale. However I'm not sure if those terms are used consistently. At least I've never found a look-up table for 3-letter-code to modified amino acid, etc. And of course this should be made clear somewhere, whichever way you go with.

pablogainza commented 3 years ago

So this is a very good point - Indeed the hydrophobicity scale prevents one from using MaSIF on small molecules, or, as you point out on non-canonical amino acids.

In reality this hydrophobicity scale is too empirical and I feel it should be learned. We initially added it for MaSIF-site just to try if there was something we were missing. It actually only adds 0.5 to the ROC AUC of MaSIF-site. For other applications it doesn't really add much.

My suggestion for handling noncanonical amino acids is to remove the hydrophobicity scale altogether and retrain masif. You could also remove the hydrogen bond term, though that one seems to add a bit to PPI prediction. You then would only have to make sure that APBS supports the non-canonical amino acids.