Closed JasonHe-WQ closed 1 month ago
Does ResourceClaim
mean binding to exactly one single GPU, so that all the containers using this ResourceClaim
can share one GPU? And does ResourceClaimTemplate
mean binding to any single GPU?
What is the different of feild ResourceClaimTemplate between gputest1 and gputest2? Is it true that one ResourceClaimTemplate claimed in two Pods rather than two containers leads to # Each container asking for 1 distinct GPU?
A ResourceClaim
has resources directly bound to it, and any pod that references it will have shared access to those resources.
A ResourceClaimTemplate
provides a template for a ResourceClaim
that will be generated on the fly for each pod that references it. In this way each claim will have their own unique ResourceClaim
with their own unique resources bound to it.
Is there a way for 2 containers of one Pod to claim to distinct single GPU each?
Note: Below is the API that was valid from Kubernetes 1.26-1.30. It has changed slightly for 1.31 but the mechanism is similar.
---
apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaimTemplate
metadata:
namespace: gpu-test
name: unique-gpu
spec:
spec:
resourceClassName: gpu.nvidia.com
---
apiVersion: v1
kind: Pod
metadata:
namespace: gpu-test
name: pod
labels:
app: pod
spec:
containers:
- name: ctr0
resources:
claims:
- name: gpu0
- name: ctr1
resources:
claims:
- name: gpu1
resourceClaims:
- name: gpu0
source:
resourceClaimTemplateName: unique-gpu
- name: gpu1
source:
resourceClaimTemplateName: unique-gpu
Cloud anyone tells me how are these differents result in the SharingGPU or DistinctGPU?
My answer to question (1) hopefully clarifies this already. Each reference to a ResourceClaimTemplate
triggers the creation of a unique ResourceClaim
(to which unique resources will eventually be bound). Each reference to a ResourceClaim
gives shared access to the resources bound to it.
I successfully ran the quick start demo gputest1,2,3 these days on Ubuntu 22.04, all the behaviors are performed as expected. With great appreciation to the maintainers and commiters, I still do have some questions on resource
ResouceClaim
andResourceClaimTemplate
.Question1:
What is the different of feild
ResourceClaimTemplate
between gputest1 and gputest2? Is it true that oneResourceClaimTemplate
claimed in twoPods
rather than twocontainers
leads to # Each container asking for 1 distinct GPU?Question2:
Is there a way for 2
containers
of onePods
to claim to distinct single GPU each?Question3:
Comparing gputest1 and gputest3, the main difference are below. A. using
ResourceClaim
instead ofResourceClaimTemplate
B .name ofResourceClaim
orResourceClaimTemplate
C. anotherspec
inResourceClaimTemplate
. Cloud anyone tells me how are these differents result in theSharingGPU
orDistinctGPU
?Thanks again for the devotion of you, and hope you could answer me sooooooon.