ga4gh / task-execution-schemas

Apache License 2.0
81 stars 28 forks source link

Passing credentials for staging in/out files #169

Open uniqueg opened 2 years ago

uniqueg commented 2 years ago

The specs currently do not provide any specific support for handling/passing credentials to services that that TES implementations need to pull/push data from/to. This might be taken care in the wider discussion of using Passports in DRS and WES, but it might be good to have an issue for this as well here, for anything TES-specific.

uniqueg commented 2 years ago

Very much requested by ELIXIR Cloud & AAI DP (all participating nodes) and probably pretty much everyone else, I would assume 🙃

uniqueg commented 2 years ago

Might be overlapping with #151, though perhaps not quite. I guess both authorization for the TES itself (and its compute) and passing through of credentials to third party services need to be addressed, and perhaps best to discuss these points together (and with WES, DRS, Passport and FASP)...

kellrott commented 2 years ago

We're looking at https://www.ga4gh.org/ga4gh-passports/ as a possible solution

uniqueg commented 2 years ago

Absolutely. But we should make sure that access to TES (and WES if they're not using TES) compute resources are covered as well, not just data access

jmfernandez commented 1 year ago

From my point of view, 3rd party credentials needed either to fetch inputs or to push outputs should be declared maybe in a similar way as it happens on the answer from https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.2.0/docs/#operation/GetAccessURL , where both the URL and the needed headers to successfully complete the request are provided.

In the case of outputs, additionally to the HTTP headers, maybe an additional field like the HTTP verb could also be needed.

But HTTP headers are too focused on .... HTTP . Other protocols might require to define the authentication in different ways (some private key for SFTP / Aspera, a complex JSON for Google Store, etc...)

uniqueg commented 1 year ago

Agreeing that we would probably need this on a per object basis, like the example you are citing, @jmfernandez.

If request size is a potential issue and reuse of the same credentials for multiple objects is common, we could also define credentials separately, each with a short identifier, then either refer to credentials directly via that identifier, or provide a one-to-many mapping of credential to object identifiers.

And yes, we should probably not name use the name headers in this case. I guess the interesting question is how we specify the schema such that each TES implementation knows what to do with it.

But if I'm not mistaken, the list of supported storage/transfer protocols is enumerated somewhere, so I guess we could use anyOf and define schemas and instructions for each (possibly some can be reused).