Open uniqueg opened 2 years ago
Some random thoughts for the discussion:
Thank you @uniqueg . This issue described our request precisely, we have human genome files (~20GB) and some static internal binary data need to be one-time-pre-populated before data processing, and want to minimize file copy consumptions.
This doesn't necessarily need to change the TES API, if there are implementations can provide such capability. But if TES API can design a standard presentation, can help a lot for other implementations.
For the syntax, my personal thought is, maybe the docker volume expression is good enough?
name-of-a- custom-volume:/path-inside-container
path-from-runtime-node-host:/path-inside-container
There might be a lot more ideas come out, like the docker volume bind propagation concepts, I can understand that TES must limit the scope at a maintainable level.
Being able to have a TES implementation have access to a persistent data volume is something that the Greek ELIXIR node requested (see here for more details). A potential use case is for a TES implementation that is deployed in an environment where it repeatedly runs specific sets of tasks and using the same reference data over and over again.
The current specification of
tesTask.volumes
do not meet this requirement as it states that they "are initialized as empty directories".A similar request was/is also discussed in Cromwell: https://github.com/broadinstitute/cromwell/issues/2190
I don't really have in mind what this could look like, but I thought I would open this issue so that we could discuss.
Thanks to @zagganas and @hex43ver