galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.38k stars 992 forks source link

Initial Support for Consuming GA4GH DRS URIs #11819

Open jmchilton opened 3 years ago

jmchilton commented 3 years ago

Some documentation on resolving GA4GH DRS URIs can be found at:

https://ga4gh.github.io/data-repository-service-schemas/preview/develop/docs/#_hostname_based_drs_uris

Both upload.py and data_fetch.py use lib/galaxy/datatypes/sniff.py::stream_url_to_file to resolve URIs/URLs to POSIX files. This functionality is included in galaxy-data and would presumably be used by future job setup code supporting deferred, as-needed datasets (i.e. https://github.com/galaxyproject/galaxy/issues/10873) to materialize files for jobs. If this method supported DRS URIs I think most things would fall into place naturally from there.

luke-c-sargent commented 3 years ago

not sure if this is what you meant by maybe an existing server exists and I just haven't found it but:

Martha v3 is an aggregator of DRS URI resolver results that we use in the AnVILFS plugin to translate DRS to viable endpoints. It requires a valid Google bearer token to auth against, however.

hexylena commented 3 years ago

Just xref for the work I'm doing for CINECA, we're working on obtaining Elixir AAI refresh tokens + ga4gh_v1_passports within the login system, and then passing those to tools (e.g. ega downloader). Maybe that will help with the authn/z portion of accessing private data?

nuwang commented 2 years ago

@hexylena Is there an issue for tracking this? Galaxy Australia is also interested in this functionality, so was wondering how far along you were?

hexylena commented 2 years ago

@nuwang no, no separate issue. We got a proof of concept deployed internally, but it needs changes to the dependencies to support the additional attributes, and need to convert my cron job to a celery task.

nuwang commented 2 years ago

Thanks. If you have any PRs/commits etc. handy, would be great to take a look.