NTHU-LSALAB / KubeShare

Share GPU between Pods in Kubernetes
Apache License 2.0
193 stars 42 forks source link

About how to write the gpu topology #25

Closed icovej closed 1 year ago

icovej commented 1 year ago

Hello, when I deploy KubeShare and test it, My pod is always in Pending. After I check the log, in kubeshare-scheduler.log, there is "No corresponding gpu NVIDIA GeForce GTX 3090 in the node master". My k8s cluster has only one node with two GPU devices. This is my kubeshare-config.yaml: cellTypes: GTX3090-NODE: childCellType: "NVIDIA GeForce GTX 3090" childCellNumber: 1 childCellPriority: 100 isNodeLevel: true

cells:

I need your help. Plz @ncy9371 @justin0u0

icovej commented 1 year ago

The biggest question is when I create a pod, it's always in Pending and there is no events. I have installed all components and installed nvidia-device-plugin.

justin0u0 commented 1 year ago

Hi @icovej, the childCellType should be joined with dashes.

You can refer to this comment for details: https://github.com/NTHU-LSALAB/KubeShare/issues/22#issuecomment-1325073356.

icovej commented 1 year ago

the childCellType should be joined with dashes.

You can refer to this comment for details:

well, in fact, after I write this issue, I realized it. It's my wrong. But after I fix it, I still couldn't create a pod. In kubeshare-scheduler.log, the error is "No corresponding gpu NVIDIA GeForce GTX 3090 in the node master".

How can I deal with it

justin0u0 commented 1 year ago

The cellId part should match the node name with GPUs.

I think that is the problem causing the error.