NordicHPC / sonar

Tool to profile usage of HPC resources by regularly probing processes using ps.
GNU General Public License v3.0
8 stars 5 forks source link

Can we read /proc/PID/cmdline to get more information? #121

Open lars-t-hansen opened 1 year ago

lars-t-hansen commented 1 year ago

Currently we grab the command field from the /proc/PID/stat output. This contains no command options/flags/arguments and it's chopped off after (ISTR) 16 characters by the kernel so we don't even get the entire executable name. The fact that it contains no options means that every job on some types of systems - the UiO ML nodes are like this - in the log is going to be python or java, and this is far from ideal.

Now, /proc/PID/cmdline has more information (if the process itself has not redacted it). Unfortunately it may not be safe to read /proc/PID/cmdline, a coworker alerted me to this, and https://rachelbythebay.com/w/2014/10/27/ps/ goes into some detail. Basically, if the process whose command line you want is in uninterruptible sleep, the process asking for the information goes into sleep too, and it will never come out of it - you can kill the latter process but the zombie will supposedly hang around until reboot. Another report: https://github.com/moby/moby/issues/15204. Most reports I find are old (8-10 years). I don't know how much of a problem this is, as it seems related to memory-constrained containers that are stuck because their memory limits have been exceeded, but I could see that being an issue on HPC systems.

One could imagine making sonar resilient against this by forking off a process to read the command line and making the parent time out if no response is received quickly, but we wouldn't want to fork off one process per process we're monitoring, for one thing. Another mechanism may be to fork off a single process to get all the command lines and if it hangs, then oh well - we'll maybe get the information on the next sonar run, and sonarlog can clean things up. Obviously this is also not ideal.

lars-t-hansen commented 1 month ago

If we do read the command line, it will become tempting to log more data from it too. We must be very careful about this so that we don't reveal secrets that are on the command line.