AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.36k stars 303 forks source link

ALIYUN_COM_GPU_MEM_IDX in the annotation is different than ALIYUN_COM_GPU_MEM_IDX inside the pod #220

Open wokalski opened 7 months ago

wokalski commented 7 months ago

Annotations:

Annotations:      ALIYUN_COM_GPU_MEM_ASSIGNED: true
                  ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1692105746106628538
                  ALIYUN_COM_GPU_MEM_DEV: 11
                  ALIYUN_COM_GPU_MEM_IDX: 4
                  ALIYUN_COM_GPU_MEM_POD: 2

Env:

ALIYUN_COM_GPU_MEM_DEV=11
ALIYUN_COM_GPU_MEM_IDX=3
ALIYUN_COM_GPU_MEM_POD=2
ALIYUN_COM_GPU_MEM_CONTAINER=2

The device:

NVIDIA_VISIBLE_DEVICES=GPU-280dd117-09e1-2e8c-25e3-52fdfac9527f

is indeed the 3rd device so the annotation is wrong and the environment variable is correct.