OpenFreeEnergy / gufe

grand unified free energy by OpenFE
https://gufe.readthedocs.io
MIT License
28 stars 8 forks source link

Protocol Atom Mapper Assumptions #211

Closed mikemhenry closed 6 months ago

mikemhenry commented 1 year ago

Can protocol authors assume that the mapped atoms are on top of each other, or does the protocol need to handle that?

@jchodera @ijpulidos let me know if I captured this correctly

IAlibay commented 1 year ago

What do you mean by on top of each other?

ijpulidos commented 1 year ago

The question is if the protocol needs to assume that we have the mapped atoms (i.e. core atoms) with the same coordinates (maybe up to some small tolerance, perses uses 0.2 A by default) or if the protocol would take care of that when given the ligand mappings. I hope that makes it more understandable.

IAlibay commented 1 year ago

Ah ok. Mappings don't make assumptions about how far the mapped atoms are from each other. That's in part a design decision so that they can be flexible enough to be used in multi topology approaches.

My take is that deviation tolerances are a protocol specific detail so that's really where they should be checked (indeed that's what OpenFE does).

ijpulidos commented 1 year ago

Ok, if I'm understanding this correctly, then this means that it's in the protocols to make sure the ligands are in the "correct" position, whatever that means for the protocol. Correct? If so, doesn't that defeat the purpose of protocols being agnostic to the mapping or the mapper that was used to produce it? Assuming that's a desired thing, that is, having protocols that can handle different mappings/mapper transparently.

Of course this would probably mean that gufe would then have to handle this, somehow. Or at least give the protocols enough information so that they can quickly know what to do.

IAlibay commented 1 year ago

I'm not sure I fully understand what you're trying to get at @ijpulidos.

From my limited understanding of your question: different protocols / methods are going to need different types of mappings. In order to avoid a chicken and egg problem, what type of mapping you need is a choice you really have to leave with your choice of compatible mappers. Ideally you should be choosing a mapper that will create a mapping that makes sense for your method. That mapper may include things like atom overlap criteria (for example Lomap does).

However if I take a mapper that works for RE-EDS restraints I shouldn't expect it to create a cross ligand mapping that works for a single topology approach?

jchodera commented 1 year ago

@IAlibay : Let me repeat this back just to make sure I understand. In order to run a relative alchemical free energy calculation that uses a "hybrid" or single-topology approach, it needs to have the shared atoms between old and new ligands at common points in space. If a Protocol implementing this method is provided with a pair of ligands and an atom mapping, it is the responsibility of the Protocol to either (1) throw an Exception if it can't handle this, or (2) use some approach to shift these atoms on top of each other to be able to carry out the calculation.

You are saying that either behavior (1) or (2) is fine. Is that right?

Presumably, we will want to write protocols that do (2) so they can be robust enough to tolerate different atom mapping strategies.

IAlibay commented 1 year ago

Thanks @jchodera for that clarification, I think I've been misunderstanding the original question.

I know this is something that's been discussed in the past, I'm not fully sure where everyone landed on this. I suspect both @richardjgowers and @dwhswenson will have different views to mine.

My initial personal view is that Protocols should be throwing exceptions and not attempting to "fix" inputs (unless it's a key part of the methodology and it's somehow done transparently / reproducibly).

I think my view might still be that the mapper, being invoked before the protocol but being chosen to satisfy the needs of a protocol, might be better placed here to "amend inputs". (I'm not sure I'm explaining my thoughts here in the best way).

However there is a bigger issue that I believe Components are designed to be immutable (@dwhswenson can correct me here). So what I'd be proposing is a mapper that ingests multiple Components and outputs Components + a mapping?

mikemhenry commented 1 year ago

I think we need to enumerate what sort of assumptions a protocol would need to make about a mapping and then have those assumptions documented. Ideal we could handle this programmatically either on the mapping or protocol side so we could check if it is supported

richardjgowers commented 1 year ago

If we're being pedantic, technically we already do a little bit of 2) since we're allowing tolerances in overlap (iirc we average this out?). I think @jchodera 's description above is what we're doing.

IAlibay commented 1 year ago

If we're being pedantic, technically we already do a little bit of 2) since we're allowing tolerances in overlap (iirc we average this out?).

I'm not fully sure that's correct (unless I'm misunderstanding you)? The protocol averages out the coordinates, that's just the only way to interpolate between the end states to create a hybrid system. We don't "fix" either end states to minimise the interpolation.

i.e. The hybrid positions are directly derived from the input system positions, not re-alligned input positions.

richardjgowers commented 6 months ago

I think the answer to this was that it's not guaranteed that there is any overlap in a mapping. The get_distances() method provides a way to check this: https://github.com/OpenFreeEnergy/gufe/blob/main/gufe/mapping/ligandatommapping.py#L193

ijpulidos commented 6 months ago

Ok, I think that's reasonable. Protocols should be able to decide what to handle in terms of distances or overlaps in atom mappings. What @mikemhenry mentioned is important, we should be documenting assumptions for the protocols (in their respective repositories) and hopefully also documenting what the consequences of "bad mappings" are (whatever is "bad" for a protocol).