ga4gh / task-execution-schemas

Apache License 2.0
80 stars 28 forks source link

Volume specification unclear #79

Closed erikvdbergh closed 5 years ago

erikvdbergh commented 6 years ago

Currently there is a volume specification allowed in a task definition as follows:

repeated string volumes = 10;

It is also stated that these volumes should be shared between executors. It is not clear to me how the volumes would be specified. Will this just be a directory on the container that should be mounted as the same directory in each executor? And does the 'size_gb' parameter link to these volumes? What amount of space should be reserved for them?

Perhaps more detailed Volume objects are required e.g.:

"volumes": [
  { 
    "name": "My first volume",
    "path": "/container/vol1",
    "size_gb": 20.0
  },
  { 
    "name": "My second volume",
    "path": "/container/vol2",
    "size_gb": 30.0
  }
]

I see in the mailing list that the initial idea for TES included a similar declaration. In the Funnel implementation there is an AddVolume method (https://github.com/ohsu-comp-bio/funnel/blob/master/worker/file_mapper.go#L88), but it is never called anywhere. It also defines a readonly parameter.

Curious to hear your thoughts.

buchanae commented 6 years ago

I agree, it's not documented well, and has confused multiple people.

Volumes are intended to allow a directory to be shared between multiple executors. It does not have a specific size, nor does it relate to any particular storage location (such as mounting a disk in Google Cloud or AWS EBS).

Previously, each volume had a size, but it seemed to make resource requests more tedious: why describe each individual volume size when you can just request one set of resources for the whole task? So the "size_gb" is the disk size requested for the whole task. That changed in https://github.com/ga4gh/task-execution-schemas/pull/47

AddVolume does get called, internally by FileMapper, here: https://github.com/ohsu-comp-bio/funnel/blob/master/worker/file_mapper.go#L215 but that function is not directly related to TES volumes (it's for setting up docker volumes).

erikvdbergh commented 6 years ago

Ok, so in practice, a volume entry will consist of a path, and the implementation must make sure that this path is available and shared between executors. Is that correct?

buchanae commented 6 years ago

Correct.

Also, in the case of docker, that volume should be declared as a writeable volume mount.

Note, #64 asks similar questions.

buchanae commented 6 years ago

@erikvdbergh I think we've clarified volumes by simplifying the spec and writing better documentation.

Is there anything left to do here? If not, can we close this?

delagoya commented 5 years ago

Closing issue, as PR #95 was merged and no further comments made.