GPU sharing corner case: vGPUs spread to two or more physical GPUs

In release v0.3.0, the main feature is GPU sharing. To use this feature, we assume user application will only request a fractional GPU. If users need 1 or more GPUs, we will direct to other GPU resources (not alnair/vgpu-memory, implement later) Therefore, in the application code, it is only assume one visible device, and the code is written based on one GPU configuration. At the same time in Alnair implementation we guarantee all vGPUs falls into one physical GPU.

For example a server of two physical gpus, each of them is split into 10 vGPU. Total capacity is 20. User can only request fewer than 10 vGPUs. Scheduler is responsible for filtering out the node that does not have any GPU has enough vGPUs. Device plugin is responsible for picking a GPU card has enough vGPUs. On a node, it is possible that some GPU have enough vGPUs, but others don't have. Device plugin will select the one has enough vGPUs.

However, this could lead to resource fragmentation. We will investigate the algorithms and strategies to minimize this drawbacks, and evaluate the tradeoff between sharing and fragmentation later.

CentaurusInfra / alnair

GPU sharing corner case: vGPUs spread to two or more physical GPUs #98