cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
329 stars 93 forks source link

Provenance data collection #4609

Open hjoliver opened 2 years ago

hjoliver commented 2 years ago

(From offline chat with @oliver-sanders)

For science experiments, we should provide proper provenance data collection:

Everything that goes into obtaining a result: workflow execution, system info, user interaction, captured in a standard format for scientific integrity purposes.

We already collect some of this information automatically, but users have to roll their own ways of scraping it from workflow DB and logs, and add code to collect system info (e.g. on job hosts) themselves.

oliver-sanders commented 2 years ago

See also https://github.com/cylc/cylc-flow/issues/3491