broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
973 stars 355 forks source link

Provenance in Research Object format? #5052

Open mruffalo opened 5 years ago

mruffalo commented 5 years ago

Hi-

The HuBMAP consortium has been implementing some workflows in CWL and running these via cwltool -- we're quite interested in storing the provenance information for a workflow run in Research Object format. This would include the inputs and outputs for a certain run, in addition to (a normalized version of) the workflow itself.

This is already implemented in cwltool and accessible through its --provenance flag; is anything like this planned for Cromwell?

Some of the HuBMAP tissue mapping centers are interested in or have been using pipelines written in WDL (e.g. ENCODE's ATAC-seq pipeline, and we would like to support these without giving up the ability to store workflow run provenance in a standard format.

Is anything like this planned for Cromwell? I didn't see anything in the issue/forum/PR searches I've been doing.

Thank you!

geoffjentry commented 5 years ago

@mruffalo We've had no requests for RO support from our internal stakeholders which is why you haven't seen anything lurking in the future. If someone wanted to contribute the functionality we'd be happy to talk but until if/when it starts popping up on internal radars it's unlikely to find its way in from the development team

mruffalo commented 5 years ago

Thanks for the reply -- that's what I had assumed, but I wanted to verify.