ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Support passing TRS URIs to workflow_url #175

Open uniqueg opened 2 years ago

uniqueg commented 2 years ago

Current situation

Similar to DRS URIs, TRS URIs have been proposed to be used as unique identifiers for resources on TRS services, which may include workflows (note the open PR for adding versioned TRS URIs to identify a specific tool/workflow version).

AS TRS offers ways of fetching all files associated with a workflow (descriptors, test files, other files), passing a versioned TRS URI should be sufficient to enable a workflow engine fetch a workflow from a TRS instance. To my current knowledge, the current specs do not specifically forbid the use of trs:// schema URLs/URIs, so the point of this issue is to discuss if, in an effort to increase crosslinks between GA4GH Cloud API specs, we should specifically recommend or even mandate WES implementations to support TRS URIs.

Available options

I will start this discussion by adding some advantages/disadvantages for each scenario:

  1. Not specifically mentioning TRS URIs:
    • PRO: No changes required, WES implementers are still free to support this
    • CON: Low likelihood that WES instances will support TRS URIs
    • CON: Upstream users not likely to make much use of it / no strengthening of TRS specs/implementations
  2. Recommending the support of TRS URIs:
    • PRO: Some strengthening of TRS specification and implementations
    • PRO: Increased likelihood that WES instances will implement TRS URI support
    • CON: Not great experience for upstream users: some WES instances will support, others won't
  3. Mandating the support of TRS URIs:
    • PRO: Guaranteed support across all compliant WES services
    • PRO: Strengthening of TRS specification and implementations
    • CON: Breaking change
    • CON: All WES implementations require code changes to stay compliant

Of course, an option would be to recommend this in a future minor WES release, then mandate it in the next major release (which would be my own preference, and I'd be happy to provide a PR for recommending the use of TRS URIs once this issue has had some feedback or ideally consensus).

Implementations

For more context, WESkit (a WES implementation for Snakemake and Nextflow) is currently implementing this here. There is also a Python-based TRS client library that people may find useful if they want to implement TRS support in Python-based WES implementations (we may add a command-line version, too, if there's some demand).

ianfore commented 2 years ago

If a workflow is a file then DRS would be usable for this TRS use case as it stands. DRS can handle any payload*

Versioning was/is a DRS concern but you wouldn't find much about it in the spec. That's intentional as it was determined to mostly be a separate concern. There are ways its dealt with, but beyond a quick response like this to detail them.

*That's not to say there aren't still issues with how DRS indicates payloads of different types.

uniqueg commented 2 years ago

@ianfore Sure, we can discuss whether workflow_url should accept DRS URIs as well (should be fine for individual files), but I don't think they are a good alternative to TRS URIs here, and certainly not a reason not to support them. TRS and TRS URIs have been defined precisely for the purpose of accessing workflows and associated metadata, which is something that WES needs to do. They allow fetching all files associated with workflows, not just descriptors, as well as metadata, versioning etc. Sure, it's all possible to do that in DRS, too, but it's not an optimal fit, at least not right now, and it would likely require specific changes to DRS that may be undesirable or are, at the very least, far off on the horizon. TRS implementations have been around and in production for years.

patmagee commented 2 years ago

@uniqueg I think this is currently left up to individual implements to decide. I agree with you that TRS was specifically designed to solve the problems so in an interconnected GA4GH world it makes sense that any WES should be able to accept a TRS URI as the workflow description. I think it also promotes sharing best practices workflows where possible (Ie validated dockstore workflows) instead of relying on having to define them always yourself. Building this ga4gh ecosystem out so that everything flows naturally into one another is a great idea.

As for DRS, I would say I do not see anyhting wrong with idea conceptually, but maybe we can open another issue for that specifically?

uniqueg commented 2 years ago

Thanks @patmagee. Indeed, a WES currently can implement it - as we have done for WESkit. So is option (1) in the OP your preference? Because the rest of what you write more goes towards option (2), or even with (3) as a perspective for a future major release?