Closed atc0005 closed 1 year ago
pseudocode:
^[0-9]+
)*/status
fileState
containing D
E.g.,
$ cd /proc
$ grep State */status | tail
934/status:State: I (idle)
935/status:State: I (idle)
938/status:State: I (idle)
939/status:State: I (idle)
940/status:State: S (sleeping)
941/status:State: S (sleeping)
976/status:State: S (sleeping)
978/status:State: S (sleeping)
self/status:State: R (running)
thread-self/status:State: R (running)
Per https://www.baeldung.com/linux/uninterruptible-process:
3.3. Methods to Stop a Process in Uninterruptible Sleep
If we ever encounter a process into uninterruptible sleep, we need to check our hardware. If we encounter the issue when using network storage, it might be down, and the process is waiting for the server to recover. Once we know the driver that is causing the trouble, we can stop it. We might need rmmod to remove the module supporting the hardware device.
Another alternative is to use the parent process identifier of the process in uninterruptible sleep. We can get the identifier of the parent process (known as PPID) and stop this process. This is sufficient for cases where the parent process is an errant shell. Killing the parent process kills the child processes, which may trigger the explicit call required by the process in uninterruptible sleep.
Finally, the last solution when nothing else works is to suspend-to-disk or restart the system. We can try first to suspend-to-disk (also known as hibernate) and resume to see if this unfreezes the process in uninterruptible sleep. If this does not work, we have to restart the system. We might not be able to restart some systems, for example, a connected network device. In this case, we should attempt to unfreeze the process with the previous methods.
Of particular note:
Another alternative is to use the parent process identifier of the process in uninterruptible sleep. We can get the identifier of the parent process (known as PPID) and stop this process. This is sufficient for cases where the parent process is an errant shell. Killing the parent process kills the child processes, which may trigger the explicit call required by the process in uninterruptible sleep.
This corresponds to the ppid value listed by the ps
"recipe" listed in the OP:
ps -eo ppid,pid,user,stat,pcpu,comm,wchan:32 | grep " D"
Having that would prove useful when hotfixing a particular issue.
pseudocode:
- cd /proc/
- ls (filter to directories named with all digits
^[0-9]+
)- look at the
*/status
file- look at the line with
State
containingD
As noted on https://www.baeldung.com/linux/process-states:
3.3. The /proc Pseudo File
The /proc pseudo filesystem contains all the information about the processes in our system. Hence, we could directly read the state of a process through this pseudo filesystem. The downside of this approach is we’ll first need to know the PID of the process before we can read its state.
To obtain the state of a process, we can extract the value from its pseudo status file under /proc/{pid}/status. For example, we can get the state of the process with PID 2519 by reading the file /proc/2519/status:
$ cat /proc/2519/status | grep State State: S (sleeping)
When parsing */status
files, it's probably worth grabbing these details:
Having this available in the list of processes in D
state would help with troubleshooting efforts.
Created new project: https://github.com/atc0005/check-process
Overview
This
ps
recipe will detect & list them, including what got them stuck in that state:Detecting any in a
D
state is sufficient cause to report the problem. Perhaps depending on what process ends up in that state could elevate the service state severity.References
Background
Go-specific