stress-ng: exit if deadline and cannot set deadline scheduler

ColinIanKing / stress-ng

This is the stress-ng upstream project git repository. stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.

https://github.com/ColinIanKing/stress-ng

GNU General Public License v2.0

1.82k stars 290 forks source link

stress-ng: exit if deadline and cannot set deadline scheduler #372

Closed bjdooks-ct closed 7 months ago

bjdooks-ct commented 8 months ago

If we set a child to deadline and it fails the admittance test from sched_setattr() call, then the best thing to do would be to exit at that point, otherwise the thread ends up trying to use all the cpu time. This is due to using sched_yield() immediatley returning as there's no need to wait for the next deadline period.

If the first sched_setattr() fails, then we get an error, but if if passes then some of the children will just ignore the -EBUSY and carry on as if nothing had gone wrong.

This is easily testable by turning down the runtime_us parameters on Linux and then trying to launch too many stress-ng instances. For this we did:

The second invocation starts but some threads end up being denied deadline as they've now exceeded our arbitaray limit.

bjdooks-ct commented 8 months ago

I think the only thing I was considering was whether to comment in the code about exiting if we cannot go deadline, along the lines of the commit message

ColinIanKing commented 8 months ago

Thanks for this fix. I'll look at it next week after I've got the current stress-ng release out of the door this week.

ColinIanKing commented 8 months ago

Hi, I'm trying to reproduce this issue to understand the fix and to ensure it works in a portable way. Do you mind explaining how the issue is reproduced so I can test the fix?

gavinmccall commented 8 months ago

First, lower the allowed time for rt tasks - for example to 25%

echo 250000 > /proc/sys/kernel/sched_rt_runtime_us

run top or similar to see task cpu loading

then load stresses using

sudo stress-ng -t 60 --sched deadline --sched-runtime 90000 --sched-period 1000000 --sched-deadline 1000000 --cpu 0&

repeat stress-ng command 2 times more - this should be sufficient that not all the threads created on the last command can be converted to sched_deadline. 5 or 6 will fail, and instead consume very high cpu.

ColinIanKing commented 7 months ago

Thanks for the detailed report and patch, I've pushed a fix that also addresses this issue and another related issue. Many thanks for help! Much appreciated.