Closed tatarsky closed 8 years ago
I believe a docker on gpu-1-16 is improperly exited and may be causing errors as a result.
gpu-1-16
But before I kill it I'd like to determine that 100%
gpu-1-16: 5e23c810a988 corcra/tf-hal "/bin/bash" 5 days ago Up 5 days sharp_goldstine
Can the owner please check it if they happen to monitor Git?
I show only this job on the node:
7566200.hal-sched1.loc (somebody else) batch pj_2920ee78-479d 25401 1 1 16gb 96:00:00 R 00:01:28 gpu-1-16/0
And that job I do not show the same PID as the item in docker:
nvidia-smi +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 9598 C /usr/bin/python 5890MiB | +-----------------------------------------------------------------------------+ docker top 5e23c810a988 UID PID PPID C STIME TTY TIME CMD root 3670 4512 0 Jul05 pts/1 00:00:00 /bin/bash root 9598 3670 1 Jul05 pts/1 02:19:24 /usr/bin/python /usr/local/bin/ipython
User with jobs on the machine confirmed it was NOT theirs. So I'm killing it as a stray.
I believe a docker on
gpu-1-16
is improperly exited and may be causing errors as a result.But before I kill it I'd like to determine that 100%
Can the owner please check it if they happen to monitor Git?
I show only this job on the node:
And that job I do not show the same PID as the item in docker: