Description of changes:
NVIDIA GPUs supports Time Slicing feature which allows user to share a GPU among a larger number of workload by dividing the GPU’s time into slices. Each workload gets a turn to use the GPU resources within its allocated time slice. This is similar to how a CPU might time-slice between different processes, ensuring that the GPU is used efficiently and not sitting idle.
This PR contains the changes required for bottlerocket to enable Timeslicing for kubernetes.
sets the value of the renameByDefault settings of the device plugin for the timesliced resources
true | false default: false
When this setting is set to false, it does not change the shared gpu's resource name. if set to true, it renames the gpus and append .shared in the gpu name. for example, if the value is set to true, the gpu name of nvidia.com/gpu will be changed to nvidia.com/gpu.shared
Note: Migration test is still in progress. I will update once the test is complete.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.
Issue number:
Closes #
Description of changes: NVIDIA GPUs supports Time Slicing feature which allows user to share a GPU among a larger number of workload by dividing the GPU’s time into slices. Each workload gets a turn to use the GPU resources within its allocated time slice. This is similar to how a CPU might time-slice between different processes, ensuring that the GPU is used efficiently and not sitting idle. This PR contains the changes required for bottlerocket to enable Timeslicing for kubernetes.
This PR introduces two bottlerocket settings API:
settings.kubernetes.nvidia.device-plugin.max-sharing-per-gpu
replicas
settings of the device plugin for the timesliced resources0
0
. the timeslicing will be enabled.settings.kubernetes.nvidia.device-plugin.rename-shared-gpu
renameByDefault
settings of the device plugin for the timesliced resourcestrue
|false
default:false
false
, it does not change the shared gpu's resource name. if set totrue
, it renames the gpus and append.shared
in the gpu name. for example, if the value is set totrue
, the gpu name ofnvidia.com/gpu
will be changed tonvidia.com/gpu.shared
Testing done:
Note: Migration test is still in progress. I will update once the test is complete.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.