Closed Dripman closed 2 years ago
已经修复了,可以更新一下
目前该问题已经解决 但dmesg 时发现有错误提示nvidia-smi[25708]: segfault at 0 ip 00007f2f72d3a14a sp 00007ffe3a9005b8 error 4 in libc-2.27.so[7f2f72bbb000+1e7000] 不知道是否会对使用产生影响
已经修复了,可以更新一下
非常感谢~
这个应该不影响使用吧,如果出现问题的话直接提issue或者加我wx:xuanzong4493
不过如果你们打算进生产的话,推荐用这个https://github.com/4paradigm/k8s-vgpu-scheduler
apiVersion: apps/v1 kind: DaemonSet metadata: name: nvidia-device-plugin-daemonset namespace: kube-system annotations: deprecated.daemonset.template.generation: '2' spec: selector: matchLabels: name: nvidia-device-plugin-ds template: metadata: creationTimestamp: null labels: name: nvidia-device-plugin-ds annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: volumes:
Thu May 5 09:50:57 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... Off | 00000000:00:0C.0 Off | 0 | | N/A 28C P0 51W / 400W | 413MiB / 40960MiB | 3% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 77977 C 411MiB | +-----------------------------------------------------------------------------+
显存仍然是40GB