BCCDC-PHL / pipeline-provenance-schema

0 stars 0 forks source link

Support collecting provenance on output files #4

Closed dfornika closed 5 months ago

dfornika commented 5 months ago

We currently have an InputFileProvenanceRecord type, which is intended to support details about input files:

https://github.com/BCCDC-PHL/pipeline-provenance-schema/blob/58cda2e938d72c193151e4b4b1ed62967cb2f71c/schema/pipeline-provenance.json#L218-L262

We'd also like to be able to collect provenance (mainly checksums) on output files. We could approach this in two ways:

  1. Change the InputFileProvenanceRecord into a more generic FileProvenanceRecord. This would involve changing the names of the input_filename and input_path attributes to filename and file_path.
  2. Add a separate OutputFileProvenanceRecord with attributes output_filename (and possibly output_path)