Stimela stats output only tracks global resource usage

JSKenyon commented 2 months ago

I discovered this by accident when attempting to use stimela's output stats to do some rudimentary profiling of cubical and quartical. At present, the stats do not track the PIDs associated with stimela and its children. Instead, the global (system) level stats are reported. This is useful for understanding the system you are running on or determining if a system was taken out by an OOM condition, but the results may be very misleading when running on contested hardware. This problem tainted all the benchmarks I ran on one of the Rhodes boxes. I have already implemented a hacky version of a fix in https://github.com/caracal-pipeline/stimela/tree/profiling-hacks, but some more work will be required to make it robust/accurate for all stats.

JSKenyon commented 2 months ago

I have made a little progress on this - I can now report accurate CPU and RAM usage for both threaded and multi-processing applications. This is adequate for my current use-case but doesn't solve the problem in its entirety. Points requiring further thought:

How do we want to treat loops which have scatter: true? This would require keeping track of the individual stats of each instance of the recipe i.e. would require being able to distinguish which of stimela's children are associated with each loop. This also raises the question of what we want the progress bar/stats to actually report in this more complex case.
Monitoring IO per process becomes a little tricky. Each psutil.Process object does have IO stats attached to it but these would need to be mangled into a form which users can understand.

o-smirnov commented 2 months ago

Yeah the profiling implementation was very naive, sorry I should have put a more prominent disclaimer! But thanks for making it smarter, @JSKenyon, this is a very valuable addition.

How do we want to treat loops which have scatter: true? This would require keeping track of the individual stats of each instance of the recipe i.e. would require being able to distinguish which of stimela's children are associated with each loop.

Well, there's already provision for accumulating per-loop-iteration stats separately. At this point it knows it's running in a subprocess associated with a loop iteration. Actually, looking at your implementation, I think it should Just Work (famous last words). Within a loop worker, psutil.Process() will return your loop executor process, so you'll only be counting the child processes of this loop iteration.

Monitoring IO per process becomes a little tricky. Each psutil.Process object does have IO stats attached to it but these would need to be mangled into a form which users can understand.

I think these have the same structure as the global I/O stats, right? So we can just accumulate them the same way. Or am I missing an important difference?

Quite apart from accounting the resource use accurately, there is also the question of what the status bar should show. I kind of like that it shows global and not just stimela usage at the moment -- useful to know if the box is being hammered by others. But there is also a case to be made for stimela-session-specific stats being shown. I wonder if we can fit both into the same status line without making it insanely complicated-looking...

JSKenyon commented 2 months ago

Well, there's already provision for accumulating per-loop-iteration stats separately. At this point it knows it's running in a subprocess associated with a loop iteration. Actually, looking at your implementation, I think it should Just Work (famous last words). Within a loop worker, psutil.Process() will return your loop executor process, so you'll only be counting the child processes of this loop iteration.

Fingers crossed - will need to give it a go!

I think these have the same structure as the global I/O stats, right? So we can just accumulate them the same way. Or am I missing an important difference?

There is definitely some weirdness that I don't fully understand. For example, the QC process under stimela will report zero bytes read, but many characters read. Will need to take a closer look once the paper is in a better state.

Quite apart from accounting the resource use accurately, there is also the question of what the status bar should show. I kind of like that it shows global and not just stimela usage at the moment -- useful to know if the box is being hammered by others. But there is also a case to be made for stimela-session-specific stats being shown. I wonder if we can fit both into the same status line without making it insanely complicated-looking...

Yeah, I was thinking about this too. Both sets of stats are helpful. Do multiline progress bars exist in rich? A mad stretch goal would be a to have a global progress bar with sub bars for each currently running task.

o-smirnov commented 2 months ago

Yeah, I was thinking about this too. Both sets of stats are helpful. Do multiline progress bars exist in rich? A mad stretch goal would be a to have a global progress bar with sub bars for each currently running task.

Right at the bottom here it says https://rich.readthedocs.io/en/stable/progress.html "You can’t have different columns per task with a single Progress instance. However, you can have as many Progress instances as you like in a Live Display. See live_progress.py and dynamic_progress.py for examples of using multiple Progress instances."

So all things are possible, depends on how crazy we want to get with this...

o-smirnov commented 2 months ago

I kind of like the minimalist aesthetic of a single-line status bar though. Perhaps all we need to add is a second number for CPU and IO counts (thus showing stimela and global use) and a third number for RAM (stimela/global use/total RAM). Load is global by definition.

JSKenyon commented 2 months ago

I kind of like the minimalist aesthetic of a single-line status bar though. Perhaps all we need to add is a second number for CPU and IO counts (thus showing stimela and global use) and a third number for RAM (stimela/global use/total RAM). Load is global by definition.

Yeah, and there is always an argument against complexity. If it fits, this makes sense.

I am a little unsure of what to do with shared memory at the moment as I am not entirely sure how you combine it across processes i.e. is it a simple sum or something more exotic? This is relevant for applications like DDF and CubiCal.

o-smirnov commented 2 months ago

I think shm can just be summed across processes.

caracal-pipeline / stimela

Stimela stats output only tracks global resource usage #305