aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
427 stars 186 forks source link

Connecting monitor actions to the provenance #6158

Open edan-bainglass opened 10 months ago

edan-bainglass commented 10 months ago

In #5659 (and AEP 008), @sphuber added live-monitoring of calculation jobs - a feature allowing conditional termination of a job with optional data retrieval and storage as if the job finished normally. This was inspired by the work of @ramirezfranciscof as part of the Aurora project (automation of battery cycling experiments). It is clear now from its use in Aurora that there is an additional requirement of the monitoring feature, specifically to visualize data snapshots during cycling. Furthermore, there are requests from the weather and climate community, as part of the SwissTwins project, to monitor fresh data and conditionally trigger further calculations. These features may or may not be supported in the current iteration. It appears to me then that monitor actions should be connected to the provenance, or at least have the ability to be in case necessary.

In the case of Aurora, the monitor currently analyzes a snapshot (produced on the same interval as the monitor's polling frequency) and conditionally kills the cycling experiment if the battery capacity drops below a threshold. Aurora's users would like to visualize the analyzed data. Much of the analysis logic is the same applied on data produced at the end of the cycling calculation. So, it seems natural to convert snapshots into data nodes (via a calcfunction?) and attach them to the monitored calculation. One thing to consider here is that perhaps only the most recent node is required at a time and should be discarded if the job completes normally.

In the case of SwissTwins, the monitor would need to trigger additional calculations/workflows. This one requires more thought. For instance, one would need to decide what would be the input to this new calculation. Perhaps this is another conditional on the fresh data.

I considered at least in the case of Aurora an alternative, where perhaps the snapshot is made available through other means. But I'm not entirely sure how this may work.

Input welcomed 🙏

sphuber commented 10 months ago

Aurora's users would like to visualize the analyzed data. Much of the analysis logic is the same applied on data produced at the end of the cycling calculation. So, it seems natural to convert snapshots into data nodes (via a calcfunction?) and attach them to the monitored calculation.

Here there are at least two options that come to mind:

In the case of SwissTwins, the monitor would need to trigger additional calculations/workflows. This one requires more thought. For instance, one would need to decide what would be the input to this new calculation. Perhaps this is another conditional on the fresh data.

As you say, this requires more thought and I would say trying to shoe-horn this in the monitor functionality is going to be too complicated and a bad idea. Even if we can get around the aforementioned limitation of AiiDA's provenance model where calculations cannot call other processes, the required level of indirection and code pathways is going to get too complex and unmaintainable.

What we would really need here is a WorkChain that can launch a CalcJob to run the "main" calc and then go in a monitoring mode itself. This would simply require the workchain to be able to push tasks on the event-loop so it can implement this wait-and-checking itself. This is currently not possible as the WorkChain interface is fully synchronous and it should allow for asynchronous code as well. This decision was made consciously when the WorkChain interface was designed: it was deemed already complex enough for users to have to write Python to implement a workflow, requiring that users write asynchronous code would have made it even worse. However, Python has improved a lot since and asynchronous code is not supported natively and really not that more complicated. We could now start to think to make asynchronous code optional in workchains, which would allow more advanced use-cases to be implemented, such as the one you describe.

I think this direction would be your best bet, but this would require quite a lot of work to aiida-core to allow simultaneous synchronous and asynchronous code in work chains. But, since we have other use-cases as well, this might be interesting enough for the team to decide to actually allocate resources to.