Expand documentation on extends and extends_key for Protocol

LilDojd commented 1 year ago

Hi! I very much enjoy using the gufe toolkit, but I still have a couple of questions regarding the design decisions of Protocols:

In the context of the ProtocolDAG class in protocoldag.py, extends_key is an optional parameter that can be passed to the constructor to specify the key of the ProtocolDAGResult that this ProtocolDAG extends from. This key is used to identify the source of the ProtocolDAG and is passed on to the resulting ProtocolDAGResult object. The extends parameter is an optional ProtocolDAGResult object that represents the result of a previously-run ProtocolDAG.

However, the purpose of extends is unclear. From reading code and comments, I am still unsure why exactly we would want to pass ProtocolDAGResult of another pdag? My guesses are:

To continue the Protocol. E.g, continue sampling from the results of a previously ran Protocol.
To define dependencies between executions of ProtocolDAG's. E.g before running Protocols assigned to Transformation we would like to perform NonTransformation to do equilibrium sampling of endpoints. (?) Then 'extends' should accept a list of ProtocolDAGResult's and be treated by something like _list_dependencies in PrtocolUnit
To perform replica or repetitions of the same Protocol

Could you please elaborate on this functionality? Thank you!

mikemhenry commented 1 year ago

@LilDojd This is a good question. Most of the team is at the MDAnalysis UGM. I will tag a few people that will likely have the answer but they may be busy this week.

@dwhswenson @richardjgowers

Glad to hear you are enjoying the toolkit!

dwhswenson commented 11 months ago

Hi @LilDojd — sorry for the slow replies here.

The purpose of extends is primarily your first use case. For example, a user might want to continue the MD in order to get more sampling. In order to do this, you need to provide it with information about the previous run, so you'll need to result that you're extending from.

Another possible use is hinted in your other question (#245): A failed ProtocolDAGResult can be extended to complete a checkpointed protocol.

Your second use case is an interesting idea: this gets into how we might handle protocols that share node-specific data between mutliple edges (e.g., nonequilibrium sampling). We're still discussing the details of that; it could be implemented with extends, but I think we may create a more general API for that purpose.

For your last use case (indpendent runs of a given protocol), there's no need for the extends parameter. We can do that simply by generating the DAG and running again.

Hope that helps — I'll continue more in your other question, which has some related themes as well.

OpenFreeEnergy / gufe

Expand documentation on extends and extends_key for Protocol #240