Closed OTP-Maintainer closed 3 years ago
tsloughter
said:
Related kernel docs: https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt It includes examples showing how the limits restrict how many CPUs are available.
lukas
said:
{quote}Limit a group to 1 CPU worth of runtime.{quote}
They talk a lot about 1 CPU worth of runtime in that documentation. To me that seems like you still get N CPUs for 1/N time units when you run.
If I start {{docker run -it --rm --cpus 4 erlang}} and then do this:
{code}
1> [spawn(fun F() -> lists:seq(1,19000), F() end) || _ <- lists:seq(1,10)].
{code}
The CPU utilization of my system ends up at 50% for each core, it does not come to 100% for 4 cores and 0% for the other. If I just start 1 process instead of 10 then I get 1 CPU at 100%.
This would suggest to me that the CFS does give the system access to 8 cores, even if only 4 CPUs worth of runtime is available. So in a system that uses 50% of the allowed CPU resources, it would be possible to get 8 parallel threads doing work.
tsloughter
said:
Hm, damn, true.
tsloughter
said:
So it may just be a matter of documentation and not a good idea to limit schedulers based on this. I would think having an option to automatically limit based on the quota/period to be useful, for the times people prefer to optimize for throughput over latency. But based on my looking at the code and the java patch it doesn't appear that simple to add, in which case it likely isn't worth it when a user can simply set the number of schedulers based on the limits they are setting for their container.
Unless you think the scheduler option is worth doing I'll mark this as resolved.
tsloughter
said:
I guess also a more complete verification would involve multiple processes with limits set. On an 8 core system and 2 nodes each with limits of `--cpus 4` how are they spread across cpus and scheduled? Does it switch between each across all cpus or does it end up scheduling each to half of the cores.
tsloughter
said:
Another issue is throttling. Having 8 schedulers with a quota of 4 "cpus" I think you are more likely to reach the quota before the next cfs period, resulting in throttling. I'm still a little hazy on this aspect.
tsloughter
said:
Has there been any more internal discussion on this on the team?
Are numbers needed to show that a scheduler per core when quota is restricted leads to CFS throttling for this change to be considered?
lukas
said:
{quote}Has there been any more internal discussion on this on the team?{quote}
No, not really.
{quote}Are numbers needed to show that a scheduler per core when quota is restricted leads to CFS throttling for this change to be considered?{quote}
I'm going back and forth in what would be the best here, but I think I've ended up thinking that it would be a good idea to restrict the number of online schedulers based on the cfs quotas. It is not obvious that it is the optimal choice, but it is what the user expects when using docker and that is the most common use-case when the quotas are used.
tsloughter
said:
Ok, great. If I can help in any way just let me know.
john
said:
First stab at fixing this: https://github.com/jhogberg/otp/commits/john/erts/container-tweaking/OTP-16105/ERL-927
I've decided to ignore {{cpu.shares}}; if I'm reading the docs right it's a weight saying how much CPU time we get relative to other processes when the system is constrained, so limiting the number of schedulers based on that doesn't feel right.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu#sect-cfs
john
said:
I've merged the changes into {{master}}, thanks for bringing this up!
tsloughter
said:
Wooo! :)
Original reporter:
tsloughter
Affected version:Not Specified
Fixed in version:OTP-23
Component:erts
Migrated from: https://bugs.erlang.org/browse/ERL-927