core dump when request 2 or more gpus with Tesla T4

1. Issue or feature description

It's ok when request 1 gpu in yaml. But when request more than 1, the output of nvidia-smi is below: The output of nvidia-smi in host machine is ok.

In another machine with GeForce RTX 2070 SUPER ,it's all right when request 2 gpus. but when I run application locally , it abort due to :

[4pdvGPU ERROR (pid:697 thread=140106827071488 context.c:189)]: cuCtxGetDevice Not Found. tid=140106827071488 ctx=0x239601906000:0x23960041a000
 home/limengxuan/work/libcuda_override/src/cuda/context.c:189: cuCtxGetDevice: Assertion `0' failed.

2. Steps to reproduce the issue

ubuntu1~20.04 + microk8s + Tesla T4 GPU + 510driver

3. Information to attach (optional if deemed irrelevant)

Common error checking:

[ ] The output of nvidia-smi -a on your host
[ ] Your docker configuration file (e.g: /etc/docker/daemon.json) -{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } }

Additional information that might help better understand your environment and reproduce the bug:

[ ] Any relevant kernel output lines from dmesg


nvidia-smi[2260220]: segfault at 0 ip 00007fde46d051ce sp 00007ffe1ae4c9e8 error 4 in libc-2.31.so[7fde46b9d000+178000]
[89993.700532] Code: fd d7 c9 0f bc d1 c5 fe 7f 27 c5 fe 7f 6f 20 c5 fe 7f 77 40 c5 fe 7f 7f 60 49 83 c0 1f 49 29 d0 48 8d 7c 17 61 e9 c2 04 00 00 <c5> fe 6f 1e c5 fe 6f 56 20 c5 fd 74 cb c5 fd d7 d1 49 83 f8 21 0f
[90182.697502] nvidia-smi[2265941]: segfault at 0 ip 00007f241971c1ce sp 00007fffff703d08 error 4 in libc-2.31.so[7f24195b4000+178000]
[90182.697509] Code: fd d7 c9 0f bc d1 c5 fe 7f 27 c5 fe 7f 6f 20 c5 fe 7f 77 40 c5 fe 7f 7f 60 49 83 c0 1f 49 29 d0 48 8d 7c 17 61 e9 c2 04 00 00 <c5> fe 6f 1e c5 fe 6f 56 20 c5 fd 74 cb c5 fd d7 d1 49 83 f8 21 0f

4paradigm / k8s-vgpu-scheduler