galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

separate routes to read data and avoid data caching in some use-cases #14978

Open bgruening opened 2 years ago

bgruening commented 2 years ago

There are at least 2 different use-cases that justify different access routes for data in Galaxy.

1) a job needs to read the data, and therefore most often from a POSIX filesystem (exceptions exist and need to be annotated, see https://github.com/galaxyproject/galaxy/issues/14975) 2) read a file from a frontend (e.g. for visualization / display), this could happen directly from the object store

The second use case should avoid pulling data into a POSIX data cache.

bgruening commented 2 years ago

At the Biohackathon we briefly discussed using the attribute has_url for a dataset.

mvdbeek commented 2 years ago

2. read a file from a frontend (e.g. for visualization / display), this could happen directly from the object store

This should be largely possible (we only need to get the upstream url form the object store) if the storage backend is S3 or something else that can be addressed by http, see https://www.mediasuite.co.nz/blog/proxying-s3-downloads-nginx/