Allow use of coordinates to select MCSS matching when desired

davidlmobley commented 7 years ago

Currently, matching for scoring (and for setting up transformations) is done entirely based on MCSS search. In some cases this will be undesirable, such as when a ligand binding mode is expected to be well known for all ligands in a series and is indicated in the input coordinates. In such cases, it might be preferred to use the input coordinates to bias the MCSS matching process rather than going with a "pure MCSS" approach. This should be implemented and made available as an option.

halx commented 7 years ago

I think we would need to discuss how much variety we can expect in binding modes and how we can handle this with single topology AFE. The Kim paper (https://doi.org/10.1007/s10822-007-9106-2) is probably a good starting point.

E.g. how would you deal with a fragment or possibly whole ligand that binds in a different binding pocket/is translocated in space. If this is a distinct binding mode I would expect there to be a barrier in place. With a single topology approach this seems to me to be a sampling problem. Doing this with dual topology may be the more efficient way.

Another question is if the spatial overlap is relatively small (while the 2D MCSS may suggest a match of most atoms). I would have to look the particular problem case up (I think it would match the urea group of the comound): the molecule is essential a six ring with an aliphatic tail, elongate the tail and the whole thing decides to turn around by 180 degrees. Obvious question: is this just and edge case to be safely ignored for the time being or is it something to consider early on?

davidlmobley commented 7 years ago

@halx

I think we would need to discuss how much variety we can expect in binding modes and how we can handle this with single topology AFE. The Kim paper (https://doi.org/10.1007/s10822-007-9106-2) is probably a good starting point.

This looks like an interesting paper! I was not aware of it. Thanks for the link. Also potentially relevant is my older "what you see is not always what you get" review (http://dx.doi.org/10.1016/j.str.2009.02.010). My guess, naively (but based on a bit of data here and there in the literature) is that this happens in maybe 10% of cases, though more often early in optimization relative to later (e.g. I've seen huge series of hundreds of late stage optimization compounds where they basically all bind exactly the same way and all they're doing is exploring tiny variations of functionality when already locked into a binding mode).

My own thinking is that we're not going to handle dramatic changes in binding modes with relative calculations at all -- a better choice there would be absolute or other work we are doing on obtaining populations of multiple potential binding modes. So I'm not particularly worried about handling these types of cases in relative calculations. More below.

E.g. how would you deal with a fragment or possibly whole ligand that binds in a different binding pocket/is translocated in space. If this is a distinct binding mode I would expect there to be a barrier in place. With a single topology approach this seems to me to be a sampling problem. Doing this with dual topology may be the more efficient way.

We're actually focused on that with our BLUES package we're starting to develop; if you're interested I'd be happy to talk to you about how it works. Basically we're after efficiently sampling across a wide variety of potential binding modes which are kinetically distinct and you would not want to sample across in a single simulation. (A simple example is even the four binding modes of toluene in T4 lysozyme, coming in two symmetry-equivalent pairs. The slower timescale for interconversion there (involving a ring flip) is around 100 ns, so you're not going to want to sample across these in relative free energy calculations. We have a variety of "real world" examples which have similar behaviors and possibly even slower timescales.) So that's NOT what I'm thinking about in this GitHub issue.

In this issue, I'm focused more on the case where most of the molecule binds in the same way, but part of it might have some ambiguity with multiple potential mappings. I think Jorgensen had some cases like this -- maybe your molecule has a chlorinated phenyl hanging off of it in one place and you're changing the chloro to an iodo and you think the chloro goes left but the iodo goes right (preserving the location of the REST of the ring atoms by rotating a single bond), whereas a naive mapping would put the two halogens on top of each other, so you want to put in input coordinates which are going to be used to bias this choice.

Or, maybe you have a ligand which is pseudosymmetric and lead series A binds head-first and lead series B binds tail-first and you want to hop from A to B, which STILL involves preserving many of the core atoms (because they're equivalent if you turn the molecule around) but you need the mapper to generate a different mapping because of the KNOWN binding mode shift than it would by default.

In other words, I'm basically thinking about cases, here, where you think you KNOW the binding mode of two different ligands or ligand series and you know that, upon a particular change, the binding mode will shift in a way which means you either (a) need a different mapping than the one dictated by MCSS, or (b) need to sample a lot longer to allow interconversion. Usually we'll prefer option (a).

I can dig up real-world examples if needed.

Another question is if the spatial overlap is relatively small (while the 2D MCSS may suggest a match of most atoms). I would have to look the particular problem case up (I think it would match the urea group of the comound): the molecule is essential a six ring with an aliphatic tail, elongate the tail and the whole thing decides to turn around by 180 degrees. Obvious question: is this just and edge case to be safely ignored for the time being or is it something to consider early on?

I think these are cases where we're going to want to use a totally different method. Specifically, my vision is that we'd ultimately want LOMAP to not just plan relative calculations but plan what type of calculations to do for particular changes. In changes where a mutation dramatically alters the binding mode of a ligand so much that it doesn't actually bind similarly at all (maybe binding in a different site or in a totally different orientation) all the benefits of relative calculations will probably disappear, and they'll be very hard to do well without a lot of hand tuning and attention, whereas absolute calculations will work "as normal". So in such cases we're going to want to have LOMAP pick places to strategically do absolute calculations.

halx commented 7 years ago

Thanks for the paper. Very useful.

I would be very interested to see those examples that you would want to handle with LOMAP. I agree that LOMAP or a future tool should handle the various setup scenarios (absolute, single/dual whatever is necessary for a given use case) more or less automatically. If we have "sufficient" (depending on some scoring criterion, I guess) overlap we attempt a relative setup. How this mapping is arrive at is secondary, but for now I see a) we just assume that topological similarity is the way to go, or b) we instruct the code that the coordinates in state B are to be interpreted as a specific binding mode that overwrites the original MCS and may lead to a smaller MCS. b) obviously requires that the coordinates of both states have been appropriately prepared by the user. If the MCS is "too small" an alternative approach should be chosen.

maxjump commented 5 years ago

Hi all,

This seems to be an really important topic. I have recently been using Lomap to create perturbation networks for some benchmark datasets in which most the changes are substitutions at a phenyl ring. As outlined in David's comment above, I found that Lomap cannot differentiate between different substitution patterns: For example 2,5-dibromophenyl is anticipated to be highly similar to 3,6-dimethoxyphenyl, though the substituents are pointing to different directions in the complexes. This leads to some very poor connections in the generated network. Has there been some progress regarding this topic?

davidlmobley commented 5 years ago

@maxkuhn sorry for the delay. No, we've not done anything on this yet, though I think @ppxasjsm and @nividic may be thinking about it some. Let us know if you're interested in getting involved.

ppxasjsm commented 5 years ago

I am aware of this issue...but not sure if I'll have any capacity to look at it more.

maxjump commented 5 years ago

Thanks @davidlmobley, I will be in touch with @ppxasjsm .

davidlmobley commented 2 years ago

This is something that @richardjgowers and team might eventually want to look at, but closing over here.

MobleyLab / Lomap

Allow use of coordinates to select MCSS matching when desired #14