Planning and ID'ing the CVs we want to give Restraints

Lnaden commented 6 years ago

This thread is to track the exact variables we want to isolate as we convert the restraints to the new CustomCVForce for easier variables.

The goal is to use each Collective Variable as a tracking for quantities which are hard to extract later. E.g. The distance the 2 centroids are from each other every iteration in RadiallySymmetricRestraints

These CV values are fetched from the Context through a CustomCVForce.getCollectiveVariableValues(context) call.

There will also need to be some engineering changes. We will want to come up with a way to neatly track and report the restraints. The problem is that the multistate module only knows of restraints as it pertains to the analysis, and the knowledge of the CustomCVForce objects does not go past the yank.AlchemicalPhase class. So in order to get the CustomCVForce.getCollectiveVariableValues call to be fetched at run time, we will need to engineer a general CustomCVForce reader for either the mulstistate.MultiStateSampler and multistate.MultiStateReporter, and/or we have to engineer the openmmtools.States module to handle that and then have the MultiStateReporter interface with it. I'd like some feed back here.

As the title says, I'd also like to use this issue to enumerate, explicitly, the CV's we want to create and track for each of the major force classes:

RadiallySymmetricRestraints
BoreschLike
RMSD

These changes will be part of the larger #1015

Lnaden commented 6 years ago

Re-posting an idea I shared in Slack for discussion

Idea for extracting CV's: We have to tell the MultiStateSampler to cycle through all of its known thermodynamic_state which are passed to it in create, then tell it to search for states which implement some common CV function.

I think when we create the common parent class for the RestraintState (yank) and AlchemicalState (oepnmmtools) we would want to add this call to their interface. I think since its a new OpenMM feature others may find non-restraint uses for the CVs and want a common way to extract them. cc #1015

Lnaden commented 6 years ago

from @andrrizzi: Hmm. The collective variable seems to be more relatable to SamplerState rather than the thermodynamic state as it changes with integration. It could be something read by SamplerState.update_from_context().

from me: The problem is that the actual call has to come from the CustomCVForce object which is tied to the ThermodynamicState

@andrrizzi: Yes, the Context has its own copy of the system though, which is unrelated to the standard system that is stored inside the ThermodynamicState.

Thats a good idea. We would still need to provide some solution to extract CV's (if any) from the sampler state. I do see one possible issue and thats what if there are multiple forces each with their own CV, but tracking them through the same name. How do we id which CV came from which force. And that would mean something external has to track map the CV index per force <-> output from SamplerState.

andrrizzi commented 6 years ago

I do see one possible issue and thats what if there are multiple forces each with their own CV, but tracking them through the same name.

Good point. I guess we could associate the force index to keep them separated, but I'm not sure about what is the best data structure to use to expose this info from the SamplerState. Maybe a dict: collective_variable_name -> [(value1, force_index1), (value2, force_index2)] or something similar?

Lnaden commented 6 years ago

I think if we leave it up to the SamplerState to fetch the data, then the external application which needs that from the SamplerState should know the mapping back to which CV in which Force, since I think the best the SamplerState can spit out is a list of lists.

system = context.getSystem()
cvs = []
for force in system.getForces():
    try:
        # This returns a list of floats
        cvs.append(force.getCollectiveVariableValues(context))
    except AttributeError:
        cvs.append(None)

# Output is a list of lists of floats. List[List[float]] in typing formalism

edit: It would then be up to the thing receiving that information to know how each of those floats map back

Lnaden commented 6 years ago

In theory, I guess we could also get the CV name by index and make a formal dict. It really is a question of what object should take ownership of the data formatting, SamplerState or the thing asking SamplerState for the CVs?

andrrizzi commented 6 years ago

I tend to favor the CV-name-indexed dictionary solution because I think that if SamplerState exposes a simple list the user will end up looking for the name of the collective variables himself.

A possible solution could be to implement a way to retrieve the CV value that assumes there is only a single CV with that name (failing when this is not the case) and another more general. Something like:

sampler_state.update_from_context(context)  # Read also CVs.
sampler_state.get_collective_variable('unique_name')  # This returns a float.
try:
    # This raises an exception since there are two CVForces with the same collective variable.
    sampler_state.get_collective_variable('duplicated_name')
except DuplicatedCVError:
    # This returns a list.
    sampler_state.get_collective_variable('duplicated_name', allow_duplicate=True)

but it may be best to try to implement this first and see what problems arise.

jchodera commented 6 years ago

Note that if we just keep this super simple and store all of the CVs defined in the system, keeping them in order (first ordered by force index, then by CV index), we can store metadata about restraints in the NetCDF file to use to figure out which CV is which parameter of which restraint.

So we could

store all the metadata we need about restraints in the NetCDF metadata
just store the values of CV 0 through n-1 in the NetCDF file in a collective_variables[n_iterations,n_replicas,n_cvs] variable

Another possibility would be to also store the energy of each restraint for rapid unbiasing, but I'm not sure how easy that would be to add.

Lnaden commented 6 years ago

The implementation tied to this issue:

choderalab/openmmtools#362

Lnaden commented 6 years ago

I think we have all the underlying mechanics in OpenMMTools now. So the question still remains: What CV's do we want to extract from each force?

Here is my list:

RadiallySymmetricRestraints
- Distance between 2 centroids
BoreschLike:
- Virtual Bond Distance
- Virtual Angle Value x2
- Virtual Dihedral Value x3
RMSD
- RMSD

Anything else?

A related question, do we know the relative performance per CV a CustomCVForce has over a standard force? e.g., the 6 CV's CustomCVForce for the Boresch-like restraints will be what % less efficient than the CustomCompoundBond force it currently is?.

jchodera commented 6 years ago

I think we have all the underlying mechanics in OpenMMTools now. So the question still remains: What CV's do we want to extract from each force?

We want to extract any CVs that the restraint defines and uses to define its restraint.

Ideally, each restraint class would register its own CVs through some internal API, and these restraints would automatically be stored each iteration and be available for defining the bound state. The restraint energy would also automatically be available for unbiasing.

A related question, do we know the relative performance per CV a CustomCVForce has over a standard force? e.g., the 6 CV's CustomCVForce for the Boresch-like restraints will be what % less efficient than the CustomCompoundBond force it currently is?.

Unless the use of CustomCVForce is pathologically abysmal (like a million times slower than CustomCompoundBond) then it should not make any difference because restraints consume a very tiny fraction of the overall iteration time.

Lnaden commented 6 years ago

Ideally, each restraint class would register its own CVs through some internal API, and these restraints would automatically be stored each iteration and be available for defining the bound state. The restraint energy would also automatically be available for unbiasing.

Sorry if I was unclear. I understand the what we want to do with the CV's once we get them out, I was just trying to explicitly enumerate exactly what CV's each restraint needs before I start implementing.

choderalab / yank

Planning and ID'ing the CVs we want to give Restraints #1036