OpenFreeEnergy / cinnabar

Package for consistent reporting of relative free energy results
MIT License
37 stars 12 forks source link

wrong estimates with FEMap with single experimental value #123

Open ijpulidos opened 3 months ago

ijpulidos commented 3 months ago

I'm experiencing some issues when trying to generate an FEMap with some computed DDGs and an absolute experimental DG for the reference compound. An example notebook that shows this is in https://gist.github.com/ijpulidos/72aff8d9440800fc9230126c9168ce50

One can see that in the dataframe for the absolute measurements/estimates you get a duplicated lig_a. I was expecting only one entry for this ligand, which is the reference ligand. Also the values after the MLE don't seem to make much sense, which I think it's just related to the same issue.

ijpulidos commented 3 months ago

Now that I think about it, maybe the duplicated entry in the table is fine, but the real issue is that the values don't make sense. I would expect the values to be around the absolute experimental measurement plus or minus the computed relative energy. I hope that makes sense.

ianmkenney commented 3 months ago

Summary after a call with @ijpulidos: The generate_absolute_values method call iterates over all edges in the underlying networkx graph and reports the dGs when the first node is a ReferenceState (whose label becomes the source shown in the resulting table) and the second node is not a reference state. This is what that graph looks like.

image

Notice that the Zero reference state, which is what @ijpulidos created as an experimental value, only connects to a single node, representing lig_a. Given that, it makes sense we see one entry for lig_a where the source is empty and it's the value you provided.

There is another reference state "MLE" that is created after maximizing the log likelihood function. The edges between this ReferenceState and the ligand nodes is generated iteratively with the results from the cinnabar.stats.mle function. This is slightly problematic since those outputs are state free energies, not free energy differences. These free energies are arbitrary up to a shared constant and don't mean anything physically, when taken alone. You can see that the differences within the MLE source group in the absolute dataframe produce the correct DDGs from the get_relative_dataframe. In short, the numbers you see with the source as MLE are arbitrary and doesn't give you anything useful, but do make sense. I suspect this is the point of #111.

EDIT: there might be some room to rework this underlying representation or change the paradigm for how MLE is applied to input data, possibly a more functional approach where state isn't so important.