Connecting monitor actions to the provenance

In #5659 (and AEP 008), @sphuber added live-monitoring of calculation jobs - a feature allowing conditional termination of a job with optional data retrieval and storage as if the job finished normally. This was inspired by the work of @ramirezfranciscof as part of the Aurora project (automation of battery cycling experiments). It is clear now from its use in Aurora that there is an additional requirement of the monitoring feature, specifically to visualize data snapshots during cycling. Furthermore, there are requests from the weather and climate community, as part of the SwissTwins project, to monitor fresh data and conditionally trigger further calculations. These features may or may not be supported in the current iteration. It appears to me then that monitor actions should be connected to the provenance, or at least have the ability to be in case necessary.

In the case of Aurora, the monitor currently analyzes a snapshot (produced on the same interval as the monitor's polling frequency) and conditionally kills the cycling experiment if the battery capacity drops below a threshold. Aurora's users would like to visualize the analyzed data. Much of the analysis logic is the same applied on data produced at the end of the cycling calculation. So, it seems natural to convert snapshots into data nodes (via a calcfunction?) and attach them to the monitored calculation. One thing to consider here is that perhaps only the most recent node is required at a time and should be discarded if the job completes normally.

In the case of SwissTwins, the monitor would need to trigger additional calculations/workflows. This one requires more thought. For instance, one would need to decide what would be the input to this new calculation. Perhaps this is another conditional on the fresh data.

I considered at least in the case of Aurora an alternative, where perhaps the snapshot is made available through other means. But I'm not entirely sure how this may work.

Input welcomed 🙏

Aurora's users would like to visualize the analyzed data. Much of the analysis logic is the same applied on data produced at the end of the cycling calculation. So, it seems natural to convert snapshots into data nodes (via a calcfunction?) and attach them to the monitored calculation.

Here there are at least two options that come to mind:

The monitor can simply copy the retrieved data to some temporary directory and log that path as a log message. The user can then use whatever analysis tools, completely independent of AiiDA, to inspect those. The advantage of this approach is that it is already possible, it is simple to implement, and it doesn't risk storing a lot of data permanently. The downside is clearly that the data is not captured in the provenance graph and can be easily lost, which might not be desirable.
The other option is to actually store the data in the provenance graph, as you suggest. It won't be possible through a calcfunction that is called by the monitor, as that would mean that the CalcJob is essentially calling another calculation, and that is forbidden in AiiDA's provenance model. Only workflows (WorkChain, workfunction, ..) can call other processes. But, the CalcJob can attach outputs itself. That is anyway already done through a Parser. So the monitor interface could simply be updated to allow to return output nodes that will then be attached to the CalcJobNode. I implemented a first proof-of-concept but there are still some problems. I think this should be possible in principle and would solve this use-case I believe.

In the case of SwissTwins, the monitor would need to trigger additional calculations/workflows. This one requires more thought. For instance, one would need to decide what would be the input to this new calculation. Perhaps this is another conditional on the fresh data.

As you say, this requires more thought and I would say trying to shoe-horn this in the monitor functionality is going to be too complicated and a bad idea. Even if we can get around the aforementioned limitation of AiiDA's provenance model where calculations cannot call other processes, the required level of indirection and code pathways is going to get too complex and unmaintainable.

What we would really need here is a WorkChain that can launch a CalcJob to run the "main" calc and then go in a monitoring mode itself. This would simply require the workchain to be able to push tasks on the event-loop so it can implement this wait-and-checking itself. This is currently not possible as the WorkChain interface is fully synchronous and it should allow for asynchronous code as well. This decision was made consciously when the WorkChain interface was designed: it was deemed already complex enough for users to have to write Python to implement a workflow, requiring that users write asynchronous code would have made it even worse. However, Python has improved a lot since and asynchronous code is not supported natively and really not that more complicated. We could now start to think to make asynchronous code optional in workchains, which would allow more advanced use-cases to be implemented, such as the one you describe.

I think this direction would be your best bet, but this would require quite a lot of work to aiida-core to allow simultaneous synchronous and asynchronous code in work chains. But, since we have other use-cases as well, this might be interesting enough for the team to decide to actually allocate resources to.

aiidateam / aiida-core

Connecting monitor actions to the provenance #6158