arminbiere / runlim

Other
12 stars 4 forks source link

flush_inactive: only consider descendants of child_pid #4

Open mtzguido opened 1 year ago

mtzguido commented 1 year ago

Hello, I've noticed that runlim would sometimes wildly exaggerate the CPU time used by a process, especially when running parallel measurements. I tracked it down to this: flush_inactive_processes is accumulating the runtime for every process that finishes, but it is not checking that they are descendants of the current invocation. Hence any other process in the system that happens to terminate will be counted.

A way to see the problem is by running this loop:

while :; do runlim sleep 1; done |& grep 'time:'

which will consistently print 0s as the CPU time, and run something else in parallel that runs long enough to be measures. This C program will do:

int main() {
    int i = 1 << 26;
    while (i--);
}

Then, the CPU time for the sleep invocations will be higher, about 0.6s in my setup.

I'm not sure this walk to the root is the best way to do it, maybe 1) it can just be cached in the Process structure whether a process is a descendant of the child_pid, or 2) maybe flush_inactive_processes could recursively traverse the tree and "enable" the accumulation when it find child_pid.

arminbiere commented 10 months ago

Thanks for reporting. I will keep this open until I have time to finish the 2.0 release (reached rc.8).

arminbiere commented 10 months ago

Ok, the in_tree function looks expensive. Maybe hashing is possible and a better alternative.

There is also a way to check for transitive ancestor relationships using the parenthesis approach (as we have a tree and not dag). We used this in our unhiding approach (see slide 14 of Ringberg11 talk or our SAT'11 paper).

This can check transitive ancestor node relationships (transitive parent process relationships) in constant time after an intial linear DFS traversal of the whole (process) tree.