ga4gh / task-execution-schemas

Apache License 2.0
81 stars 28 forks source link

feat: allow wildcards in output paths #185

Closed uniqueg closed 2 years ago

uniqueg commented 2 years ago

Fixes #77

Description

This PR adds provisions that allow clients to specify pathname matching wildcards ("globs") when specifying task outputs.

It addresses the discussion points summarized in this comment in the following ways:

Examples

uniqueg commented 2 years ago

Would appreciate feedback @kellrott @aniewielska @pditommaso @mr-c @MattMcL4475 @vsmalladi @wleepang @geoffjentry 🙏🏻

mr-c commented 2 years ago

Looks good to me; tagging @tetron for his Arvados perspective and additional CWL perspective

wleepang commented 2 years ago

I think this looks good, but I'm curious to see some concrete examples. For instance, if I set:

tesOutput.path: /path/to/folder
tesOutput.path_prefix: /path/to/folder
yes output.url: s3://bucketname/my/results

Should I expect the contents of /path/to/folder to be copied recursively to s3://bucketname/my/results?

vsmalladi commented 2 years ago

@wleepang If i understand this correctly this is what I would expect

tesOutput.path: /path/to/folder/data tesOutput.path_prefix: /path/to/folder output.url: s3://bucketname/my/results

URL: s3://bucketname/my/results/data

@uniqueg can you confirm this is what is expected with the combination of output_url and tesOutput.path_prefix?

uniqueg commented 2 years ago

@wleepang @vsmalladi

I have now added three examples to the PR description (my bad for not doing so earlier!).

Note that in the examples you give, the path doesn't include any wildcards, so a TES implementation should ignore tesOutput.path_prefix and interpret the value of tesOutput.url as a fully qualified URL at which the object created at tesOutput.path should be made accessible.

So in your example, @wleepang, contents of /path/to/folder would indeed be recursively copied to s3://bucketname/my/results.

In your case though, @vsmalladi, contents of /path/to/folder/data would be copied to s3://bucketname/my/results, not s3://bucketname/my/results/data.

Or at least this is my interpretation of the current specs.

vsmalladi commented 2 years ago

@uniqueg Thanks for the clarification.

Then LGTM.