Open IAlibay opened 2 years ago
@IAlibay: This is great! Thanks so much for putting this together!
In my discussions with @dotsdl, one of the key ideas is to make sure we capture all the relevant metadata (in your case: force field, atom mapping strategy, etc) in an extensible data object, and to provide as output highly valuable datasets that capture all the critical information you need, with the ability to retrieve the raw data or trajectories as needed. For example, you might want to create a project where you execute a variety of transformation networks for different targets with different mapping or network planning strategies, or create a very large dense network that contains all of this in a single calculation. You should be able to retrieve the lightweight results data into a Python object and do exploratory data analysis without the need to collect and analyze primary data, in order to determine which strategies work best.
Many of the downstream users are considering building dashboards that provide different views of the data produced in a common object model---you could imagine doing the same to compare different releases (regression analysis) and different run options (to identify best practices or refine implementations) for an ever-expanding set of benchmark systems. You should still be able to retrieve targeted raw data (snapshots, trajectories, other output data) should you need to, or re-run targeted simulations locally if you need to explore failures.
extensible data object, and to provide as output highly valuable datasets that capture all the critical information you need, with the ability to retrieve the raw data or trajectories as needed.
This sounds super useful.
Assuming some campaigns may be removed over time (either due to obsolescence or just "this was completely wrong") - if possible, it would be great to make entries in these dataset objects immutable over time but have a way to retrospectively annotate them (e.g. trajectory data is no longer available + why, or maybe just "these data points are attached to this publication").
You should still be able to retrieve targeted raw data (snapshots, trajectories, other output data)
So I think something that is implicit here, but might be worth explicitly bringing up is the ability to do customized post-simulation processing. For example having a means for folks to do something like MM -> QM / QML bookending would be quite useful.
I personally don't think this really should exist within the F@H ecosystem (running QM or arbitrary ML code seems somewhat out of scope), but others might disagree here.
Raw notes from story review, shared here for visibility:
Not sure if it fully falls under the required "user stories", but we (i.e. OpenFE team) were discussing this internally after yesterday's meeting and thought it might be good to put it up. From recent discussions, I think it's something that's already being thought about so it might be good to formally put it here.
In broad terms, what are you trying to do?
Given the large amount of alchemical calculations which will go through F@H using this framework, there is a good opportunity for us to retrospectively analyze how well certain transformations perform.
From this we should be able to:
How do you believe using this project would help you to do this?
By its own existence, this project should generate large amounts of data which we can learn from.
Should we be able to expose both inputs (i.e. force field info, relevant binding site information, atom mappings) and outputs (i.e. simulation outcomes - e.g. dH timeseries, convergence metrics, work distributions, etc....) in a digestible manner, it should be reasonably simple for someone to gather this data and analyze it as required.
What problems do you anticipate with using this project to achieve the above?
tagging in @richardjgowers as an interested party