As #10 shows, we are starting to add on top of the provided YAML file. These things need to be documented and stored along the output before we move to a distributed system.
The provenance information should contain:
the YAML file used for this run
extra information added by fasthep-flow
a snapshot of software versions (e.g. use fasthep cli for this)
a hash for each workflow stage based on its values
ability to gather provenance info (executed on node, software versions, date, etc) PER TASK → will require a mechanism to inject work into task (kind of like a pre/post instruction)
ability to store all of the above in any format (implement HDF5 to start with)
As #10 shows, we are starting to add on top of the provided YAML file. These things need to be documented and stored along the output before we move to a distributed system.
The provenance information should contain:
the YAML file used for this runa hash for each workflow stage based on its values