Closed lars-t-hansen closed 6 months ago
The format has changed, and unless we want to fix #87 we must handle multiple formats properly.
ml1: NVIDIA System Management Interface -- v545.23.08
$ nvidia-smi pmon -c 1 -s u # gpu pid type sm mem enc dec command # Idx # C/G % % % % name 0 1174916 C 88 54 - - python 0 1186862 C - - - - python3 1 1174916 C 92 53 - - python 1 1223470 C - - - - python3 2 1174916 C 89 53 - - python 2 941737 C - - - - python3
gpu-13.fox: NVIDIA System Management Interface -- v550.54.14
$ nvidia-smi pmon -c 1 -s u # gpu pid type sm mem enc dec jpg ofa command # Idx # C/G % % % % % % name 0 - - - - - - - - - 1 - - - - - - - - - 2 - - - - - - - - - 3 - - - - - - - - -
It could look like the sensible thing to do here would be to decode the # gpu line and use that as a key into the other data. We could sensibly try to detect issues and signal problems via the gpufail field.
# gpu
The format has changed, and unless we want to fix #87 we must handle multiple formats properly.
ml1: NVIDIA System Management Interface -- v545.23.08
gpu-13.fox: NVIDIA System Management Interface -- v550.54.14
It could look like the sensible thing to do here would be to decode the
# gpu
line and use that as a key into the other data. We could sensibly try to detect issues and signal problems via the gpufail field.