andrewrk / poop

Performance Optimizer Observation Platform
MIT License
788 stars 50 forks source link

look into CPU shielding #39

Open andrewrk opened 1 year ago

andrewrk commented 1 year ago

https://manpages.ubuntu.com/manpages/trusty/man1/cset-shield.1.html

I just learned about this today (thanks @verdagon!). Maybe whatever syscalls it is using under the hood could be a nice way to make poop obtain less noisy measurements.

matu3ba commented 1 year ago

Here is an In-depth manual from 8 days ago at https://documentation.suse.com/sle-rt/12-SP5/single-html/SLE-RT-shielding/, upstream https://github.com/lpechacek/cpuset with issues.

Please note, that cset-shield is written in Python and GPLv2 and is a few thousand LOC.

The manual also describes some not nice quirks

Note. There is a minor chance that a task forks during move and its child remains in the root cpuset. 

I think the author did not want to deal with strace and/or pid 1/process group tracking, which is another level of complexity and inefficient in Python.

Afaiu, there are 4 things needed

I think a partial reimplementation in Zig should start with the quirk (process movement handling forks). I think I'll make a writeup of the underlying problem soon.

matu3ba commented 1 year ago

I do not yet understand what time guarantees the Kernel provides regarding when reads and writes to the pseudo-file system being applied, so I asked the author of the tool with polite hints how to fix some Python stuff: https://github.com/lpechacek/cpuset/issues/46

I hope there are callbacks or there is anything from the Kernel, because otherwise we would need to do dirty waiting and "hope that it has applied" leaving the door open to spurious failures. Even, if tracking clone would be handled, for example via strace.

Overall overview: https://man7.org/linux/man-pages/man7/cpuset.7.html