galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.39k stars 1k forks source link

S3 compatible tool data-type interface #14975

Open bgruening opened 1 year ago

bgruening commented 1 year ago

Tools popping up that can natively understand S3.

It would be nice if we can annotate the tools type="data" with a flag that indicates Galaxy that a file does not need to be cached in a POSIX filesystem before being submitted to the cluster, since the tool can work with s3://bucket/prefix/key.mzML.gz like path.

This setting would need to be overwritable by an admin in case the cluster does not allow network access.

mvdbeek commented 1 year ago

How many tools like that are there ? These should probably just be type="file" instead of type="data" ?

nuwang commented 1 year ago

Do these tools that natively process s3 do ranged reads or similar that would provide an efficiency gain? If not, might as well let Galaxy download and manage the file?

bgruening commented 1 year ago

How many tools like that are there ? These should probably just be type="file" instead of type="data" ?

One that I know :) Got a request for it.

Do these tools that natively process s3 do ranged reads or similar that would provide an efficiency gain? If not, might as well let Galaxy download and manage the file?

I have no idea. But I guess we should assume so, we will probably see more and more tools supporting this.