ExaWorks / job-api-spec

https://exaworks.org/job-api-spec/
3 stars 3 forks source link

Clarify what GPU cores in `job.spec.resources.gpu_cores_per_process` means #141

Open hategan opened 3 years ago

hategan commented 3 years ago

Describe the error, inconsistency, or unclear wording here

GPUs have thousands of specialized cores that cannot generally be shared between different users at the same time since there is little in the way of a multi-user task scheduler on the GPU. Hence, GPUs being used one physical card at a time.

It then makes little sense to allocate GPU cores to process. Instead, it seems like the better unit to allocate would be GPU chips (or GPUs, in short).

So this needs clarification.

SteVwonder commented 3 years ago

Agreed. I think that gpu_cores_per_process should be changed to something like gpus_per_process. Interestingly, that is how it is used in the examples already:

import jpsi

res_spec = jpsi.ResourceSpec()
res_spec.process_count     = 10
res_spec.cores_per_process = 4
res_spec.gpus_per_process  = 1

So if we don't change it, it seems we also have some inconsistencies in the examples to clean up :)