Closed bjdooks-ct closed 7 months ago
I think the only thing I was considering was whether to comment in the code about exiting if we cannot go deadline, along the lines of the commit message
Thanks for this fix. I'll look at it next week after I've got the current stress-ng release out of the door this week.
Hi, I'm trying to reproduce this issue to understand the fix and to ensure it works in a portable way. Do you mind explaining how the issue is reproduced so I can test the fix?
Hi
First, lower the allowed time for rt tasks - for example to 25%
echo 250000 > /proc/sys/kernel/sched_rt_runtime_us
run top or similar to see task cpu loading
then load stresses using
sudo stress-ng -t 60 --sched deadline --sched-runtime 90000 --sched-period 1000000 --sched-deadline 1000000 --cpu 0&
repeat stress-ng command 2 times more - this should be sufficient that not all the threads created on the last command can be converted to sched_deadline. 5 or 6 will fail, and instead consume very high cpu.
Thanks for the detailed report and patch, I've pushed a fix that also addresses this issue and another related issue. Many thanks for help! Much appreciated.
If we set a child to deadline and it fails the admittance test from sched_setattr() call, then the best thing to do would be to exit at that point, otherwise the thread ends up trying to use all the cpu time. This is due to using sched_yield() immediatley returning as there's no need to wait for the next deadline period.
If the first sched_setattr() fails, then we get an error, but if if passes then some of the children will just ignore the -EBUSY and carry on as if nothing had gone wrong.
This is easily testable by turning down the runtime_us parameters on Linux and then trying to launch too many stress-ng instances. For this we did:
The second invocation starts but some threads end up being denied deadline as they've now exceeded our arbitaray limit.