Open rartino opened 6 years ago
I believe, standardizing workflows goes beyond the scope of optimade at this stage. When asked, Donny actually was referring to providing, e.g. with a structure, a unique identifier of a workflow run to produce it. This seems like a much more manageable task.
Agreed. But, in case we want to go in the direction of more explicitly encoded workflows eventually, perhaps the best name of that identifier is workflow_id
? This connect closely to #24, I'll add that suggested field there.
Some ideas might be borrowed from Common Workflow Language, which is stable (v1.0.2) already.
A proposal is to, for now, handle this like we do with calculations
: i.e., workflows
is an "empty" entry type, that databases can populate with their own database-specific-prefix identifiers.
Both with this proposal and the existing calculations
, these are now actually quite useful with the recent introduction of queries on relationships. Just from that mechanism there would directly be a standardized way to filter on, e.g., "what other calculations were produced using the same workflow as this one".
As per discussion with @gmrigna during the workshop 2022, standardization and exchange of workflows between the different engines (AiiDA, FireWorks, etc.) seems more actual and demanded with time. The recent publication along with the aiida-common-workflows repository shows an ongoing work in progress.
At the workshop 2023 a few of us, @gmrigna, @utf, @giovannipizzi, me and others were discussing workflow standardization. This lead up to some form of design idea. No claim here of consensus, just that the discussions spawned the ideas below - take this as fairly loose thoughts at this point.
Essentially, the idea is:
This also fits with how one can start to build a provenance structure for OPTIMADE. A calculations
entry can now be categorized based on the high level workflow that in the end was executed + the description of the workflows it ended up actually executing in the end + the first inputs + final outputs (described as OPTIMADE entries).
I then imagine a very similar design can be set up for experiments, where a workflow now is an experimental procedure with inputs and ouputs, and an /experiments
endpoint is used to categorize experiments based on the abstract experimental procedure executed, how it was broken down into substeps, the original inputs and final outputs (described as OPTIMADE entries).
First of all, thanks @rartino for summarizing and formalizing our discussions. I also think that it might be nice to involve @gpetretto and @davidwaroquiers in subsequent discussions.
It would be nice with a workflow entry type in OPTIMaDe do describe a workflow as steps.
It may be nice if calculations can reference this to explain the workflow taken in the calculation.
We could have abstract standardized names for steps, e.g., 'structure relaxation', plus allow database specific prefix ones, e.g.
_exmpl_calculate_color
.It may be nice to think carefully what goes into workflow and what goes into calculation (parameters?)