MobleyLab / Lomap

Alchemical mutation scoring map
MIT License
37 stars 17 forks source link

Issues with current perturbation map generation #46

Closed ppxasjsm closed 2 years ago

ppxasjsm commented 5 years ago

[dummy issue will expand tomorrow, but needed it to reference]

ppxasjsm commented 5 years ago

I guess I never quite got round to addressing this. I wrote up some thoughts on LOMAP that discuss the mapping issue. I'll attach it here since it seems most relevant. All of this is only relevant to SOM- Sire calculations, and may in fact not be problematic for other free energy tools, however, this has not been tested explicitly. BioSimSpace could be used in the future to assess this. The D3R2018 pdf is a LOMAP generated perturbation network for the Cathepsin part of the D3R GC4 (https://drugdesigndata.org/about/grand-challenge-4). The red arrows indicate perturbations I most likely expected to fail. We have attempted running some of them with a failure rate of 70% or more and in the end, went with a manually designed network, with a much smaller failure rate of around 10%.

lomap_notes.pdf D3R2018.pdf

davidlmobley commented 5 years ago

I generally agree with your points here (tagging @nividic so he'll see it as well). To a large extent some of this depends on someone doing the science of which types of connections are efficient/"good" and it's fairly safe to say this is likely NOT driven just by the size of the MCS, though that's the "conventional wisdom". My personal opinion is that it likely has more to do with other physical/geometric properties of the molecules than MCS (happy to explain which properties I'd want to look at in a call).

At one point we'd started to look at this by doing relative HYDRATION free energies for a large map of planned calculations spanning FreeSolv, but basically just didn't quite finish because of student continuity issues (the student who was working on it didn't end up with time to finish it). I still think that's the right way to do it, as what works for protein-ligand binding will likely be too system-dependent (e.g. dependent on protein conformational change induced by a specific transition, etc.) to be able to easily generalize.

You can also imagine cases where for similarity you would want to score based not on MCS or anything similar, but based on "similarity of binding mode", e.g. if you've already got a good idea of likely binding modes you might want to score these based on shape/structure similarity WITH the candidate 3D structures in mind.

I think basically there's a lot of room for: a) alternative methods of similarity scoring b) alternative methods of laying out the graph (e.g. some people prefer a hub-and-spoke model designed for a particular lead series

ppxasjsm commented 5 years ago

Thank you for the feedback and I also agree with what you say. I'd be happy to hear more about the properties you might want to look at @davidlmobley. Maybe we can arrange a call in the new year.

davidlmobley commented 2 years ago

Closing this; maintenance moved to github.com/OpenFreeEnergy/Lomap.