ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Suggest that implementors use HTTP Link header field to indicate provenance when retrieving results #13

Open mr-c opened 6 years ago

mr-c commented 6 years ago

https://www.w3.org/TR/prov-aq/#resource-accessed-by-http

Idea is from @stain

david4096 commented 6 years ago

This is a clever usage of the feature that helps with the last leg of a workflow execution. I think it will require modification directly of the OpenAPI description.

jaeddy commented 5 years ago

I'm pro-provenance in general, but would need some more details on what this might look like in the spec. Tagging as a v2.0 candidate for now.

mr-c commented 5 years ago

To be more specific, the returned value would be the IRI/URI to a v0.6.0 or newer CWLProv ResearchObject https://w3id.org/cwl/prov/0.6.0

ruchim commented 3 years ago

@mr-c to clarify for my sake -- your suggestion enables something like: 1) I'm doing a status check on a workflow 2) I see the status is a failure 3) Linked to the status through "provenance-uri" is the actual http link to the raw workflow definition associated to the failure -- which I can go to investigate? 4) ....

mr-c commented 3 years ago

@ruchim Almost, for step 3 the URI points to a CWLProv document that would give detailed information, not the workflow definition (which one assumes the caller of the API already has a copy of)

ruchim commented 3 years ago

ahhh! is the provenance-uri the same as stdout/stderr logs -- or something else? also, thanks so much for the quick response, really appreciate it.

mr-c commented 3 years ago

The provinance-uri would point to a CWLProv format document which would contain structured data including raw logs, server information, etc.. 🙂👍

stain commented 3 years ago

I think from https://www.w3.org/TR/prov-aq/#resource-accessed-by-http in PROV we kind of allowed any kind of provenance document, although one containing PROV in one of the several formats would be preferable.

I would not require all in CWLProv - that is kind of the inside of the workflow and could be exposed as well if present as a Research Object BagIt archive (as it would be multiple files) or as a directory of files exposed through the WES - there is no single "CWLProv document" as such, we have both primary.cwlprov.* in multiple serializations, or metadata/manifest.json that types and links to all the other files.

Perhaps WES would have its own "outer" provenance that just says when the workflow job started/stopped and ideally links to its outputs?

ruchim commented 3 years ago

Cool, did some reading on the links to catch up -- and thanks to Jeff Gentry for explaining CWLProv a little more deeply. My own thoughts are that these are really good best practices from the perspective of leveraging features of the http spec. If I put myself in the mindset of someone who runs workflows a lot, I'd absolutely need logs of my workflow run and a link to that log (whether it looks like a provenance object or not) and I'd expect a link to those logs directly in the API response (not just header, which I may never even know to check as comp bio rather than software engineer). So I see the logs as a necessity in the spec and the provenance_uri a bonus/competitive feature for providing structured details for debugging/tracking.

So it sounds like think this is something to add to WES documentation as a recommendation rather than a spec change. let me know if I misunderstood!

ruchim commented 3 years ago

just poking @stain @mr-c @david4096 for any opinions to my comment above!

mr-c commented 3 years ago

@ruchim Yep, it can go in the header and in the body of the response, agreed!

ruchim commented 3 years ago

excellent, I'll mark this as a documentation change.