NordicHPC / sonar

Tool to profile usage of HPC resources by regularly probing processes.
GNU General Public License v3.0
8 stars 5 forks source link

Probably monitor private memory size too #212

Open lars-t-hansen opened 5 days ago

lars-t-hansen commented 5 days ago

A new one today (https://github.com/NAICNO/Jobanalyzer/issues/697) - a cluster of processes on a slurm-less system that share a very large memory blob (27GB) throw off our resident-memory readings, because we have no way of dealing with this. This was sort of known at the time - we want PSS (proportional set size) but can't have that without root access, so we settled for RSS, but that fails to handle this case.

Not sure how important this is - we won't know before we start looking. It could look like the process in question is some python thing that presumably uses fork() + shared memory to share a large data blob with concurrent processing, but this is only a guess.

We could minimally record private memory to at least detect this case, or make it possible to do so in postprocessing?

We more or less know how to compute process trees that are of interest and could probably use that somehow.