NordicHPC / sonar

Tool to profile usage of HPC resources by regularly probing processes using ps.
GNU General Public License v3.0
8 stars 5 forks source link

Log disk I/O #135

Open lars-t-hansen opened 4 months ago

lars-t-hansen commented 4 months ago

The use case here is jobs that are "unexpectedly slow", we want to know whether this is because they are I/O bound or are held up by slow I/O. For example, on interactive nodes (login nodes, Fox int* nodes, UiO ML nodes) memory can be oversubscribed and the system can be paging, or there can be a shared disk that is hammered and is holding up progress (the latter seems to be an issue on Saga login nodes, which are deadly slow but where very little computation actually happens).

As for #67, let's try to collect data if we can, and see if we can't surface it in some sensible way in Jobanalyzer.

Also see https://github.com/NAICNO/Jobanalyzer/issues/399.

lars-t-hansen commented 4 months ago

If a job is not computing it's either descheduled or in I/O wait, but ideally we want to distinguish disk from tty from network, and really-ideally also distinguish the different interfaces or devices.

On an HPC node with 128 cores there can be many jobs running at the same time, and this is especially true of login and interactive nodes. So it's not quite enough to account for whole-system I/O wait (even if that might be better than nothing).

But all that said, there's no way to say objectively that "there's too much I/O wait" if a job has threads that can make progress while other threads are waiting. "Too much" is relative to an expectation. Even on a superfast disk there will be I/O wait.

One measure that might make sense is average wait (or better, time) per I/O operation. Then we remove sonar/Jobanalyzer from judging whether something is slow or fast, waiting or busy. Also, I/O count would be helpful. Of course, going down that path one could imagine a distribution of timings by count, but I don't expect the kernel keeps that around.

bast commented 4 months ago

But would sonar then make regular well-defined reads and writes and measure how long it takes?

lars-t-hansen commented 4 months ago

But would sonar then make regular well-defined reads and writes and measure how long it takes?

I've been looking at this but not commenting, apparently. It looks like waiting for disk writes is not a thing; they happen in the background. So (for disk) it's mostly about waiting for reads, and not just reads made explicitly but also page-ins from mapped executables, mapped files. I believe htop presents some data about this and the first order of business is to dig into that (documentation, code) to see if it leads anywhere.

lars-t-hansen commented 2 months ago

This recipe produces desired results on my Ubuntu 22 (Linux 6.5) laptop, but it does not work on a Saga login node (Linux 5.14), I get the "Avg" display but not the detailed breakdown. Given how old that post is, it's probably how the kernel is configured, not its version, that is the issue.