Closed tarekgh closed 4 years ago
cc @JeremyKuhne @ianhays
This is a NetFX issue. CoreFX spin waits here. @ianhays
Closing this as it isn't .net core relevant.
I've backported this fix to full .net and it should be included in the release after the next one.
now that's what i call service!!! thank you!!!
so i see the corefx code uses SpinWait.SpinOnce() on each loop iteration.
this strikes me as ultra agressive and suprising behavior. the API from the outside would appear to block ala a Socket.Connect() when in reality it is going to eat tons of CPU. SpinOnce() at best will Sleep(1) which is virtually a no-op when the other loop activity is significant as in this case.
why not start by sleeping 50ms and then bump it up to 250ms after a certain number of iterations? if the user is waiting a for a server then an extra 250ms shouldn't be noticed. But a pegged CPU due to spin waiting surely will and is quite unfriendly.
when in reality it is going to eat tons of CPU
SpinWait has fallback behavior, such that i'll end up doing Thread.Sleeps:
Your test is analogous to
while (true) {
Thread.Sleep(1);
}
but really what is happing is
while (true) {
DoSignificantWork()
Thread.Sleep(1);
}
what matters is the ratio of work to sleeping to figure out CPU load. SpinWait is inappropriate when you are doing a lot of CPU bound work in the loop. Eg 99ms workload and sleep 1ms -> 99% CPU utilitzation. #NotGood!
(that said i didn't actually test anything so i may be full of sh*t).
SpinWait on my system:
public void SpinOnce()
{
if (this.NextSpinWillYield)
{
CdsSyncEtwBCLProvider.Log.SpinWait_NextSpinWillYield();
int num = this.m_count >= 10 ? this.m_count - 10 : this.m_count;
if (num % 20 == 19)
Thread.Sleep(1);
else if (num % 5 == 4)
Thread.Sleep(0);
else
Thread.Yield();
}
else
Thread.SpinWait(4 << this.m_count);
this.m_count = this.m_count == int.MaxValue ? 10 : this.m_count + 1;
}
Eg 99ms workload and sleep 1ms -> 99% CPU utilitzation
It generally won't sleep for 1ms... more likely 10-15ms. And the work required in WaitNamedPipe if it doesn't block is small.
but really what is happing is
Can you please share a repro on .NET Core 2.0? I see this:
ok, i hear you and it probably is ok on windows. the thing is there are multiple assumptions built in that are fragile- 10-15ms timeslice architecturally depenedent on the windows scheduler. not sure what would happen on linux especially on a kernel that might be build with NOHZ or realtime / low latency (1ms scheduler if i recall). we're also assuming TryConnect() is fast. i just don't see why SpinWait is chosen other than it being a 1-liner to code rather than an adaptive backoff or even a straight up Thread.Sleep(50). thanks for entertaining my gripes!
i just don't see why SpinWait is chosen other than it being a 1-liner to code rather than an adaptive backoff or even a straight up Thread.Sleep(50)
Because there's a downside to longer waits, even with a backoff scheme, and we've not yet seen a case that's harmed by using the smallest sleep possible. If you have one, please do share, but otherwise in the absence of an actual problem we'd prefer to optimize for the cases we know to exist and to keep the code as simple as possible. One of the most common scenarios that hits this code path is a process forking off another that it then wants to communicate with over a named pipe; generally the delay before the server creates the listening pipe is very small, and any additional delay incurred by the client artificially sleeping for a longer period of time can actually add up to very noticeable delays if done frequently enough. The spin wait fix is simple, easy to understand, and addresses the problems we've seen (the biggest problem we saw was with the C# compilation server, when on a single core box, the client could end up starving the server's ability to create the very pipe the client was waiting for). If there are other problems and you can share repros that highlight them, we'll want to revisit.
kudos @stephentoub . i get it. i appreciate you taking the time to explain. what you say makes complete sense. i was a little wound up by getting hit with the 100% cpu bug on netfx but you've clearly thought it through. thank you.
Thanks, @coderb. If after trying the fix you do find situations where it's still problematic, please do let us know :)
From @coderb on July 20, 2017 11:31
NamedPipeClientStream does a tight infinite loop (spins without blocking) if Connect(Timeout.Infinite) is called and the server is unavailable (not running or the pipe is busy).
I would suggest that the retry loop should yield the cpu (Thread.Sleep/Task.Delay) for say 250ms.
Copied from original issue: Microsoft/dotnet#445