ga4gh / task-execution-schemas

Apache License 2.0
82 stars 29 forks source link

Resource requests should be at executor level. #208

Open claymcleod opened 1 month ago

claymcleod commented 1 month ago

It seems strange that resource requests are specified at a task level instead of a executor level where images are actually specified. The lack of flexibility surrounding resource allocation at this level greatly inhibits one of the major potential benefits (the biggest potential benefit?) that the executors abstraction provides—you can't save on resources for commands that don't require a huge amount of CPU/RAM/disk.

kellrott commented 1 month ago

Executors are run meant to be run sequentially on a single machine allocated to a task. In most deployments, the TES service allocates a machine (either VM or HPC node) starts up a runner, the runner moves all files into place and then invokes each executor one after the other. AWS charges you for the full VM for the full time that you use it, even if you are only using half of it for some of the processes. The only way to change the allocation size is to have request another sized VM and move the tasks there, which would be equivalent to issuing two different tasks. Same with HPC systems, like SLURM.

claymcleod commented 1 month ago

Thanks for the context @kellrott, and I think it does make sense for the specific instances you bring up here. Additionally, I knew the part about the executors running sequentially in order from this part of the spec:

executors

An array of executors to be run. Each of the executors will run one at a time sequentially. Each executor is a different command that will be run, and each can utilize a different docker image. But each of the executors will see the same mapped inputs and volumes that are declared in the parent CreateTask message.

That being said, the idea that executors were intended to run on a single machine is new (at least to me from my reading of the documentation). That might be something to clarify in the spec if it isn't already and I just missed it.


Above notwithstanding, I still think it makes sense to consider this change for situations where:

In my mind, specifying the resources at the executor level would subsume the use cases you listed (e.g., by looking across all executors before requisition and picking the maximum resource usage), while servicing the cases I list above aren't possible in the current state of affairs.