Open GoogleCodeExporter opened 9 years ago
Thread post here: http://foldingforum.org/viewtopic.php?p=221015#p221015
Original comment by harlam357
on 9 Oct 2012 at 10:15
We (team 33) investigated this issue recently and found it to be triggered by V7
clients. Top level: problem lies in Connection.cs logic which repeatedly invokes
timer to "check for data received on the socket".
Problems with the logic are as follows (line numbers per r595):
1. Client socket is a blocking socket so, when there are no data on the socket,
one
timer thread gets to sleep on blocking read here:
480: int bytesRead = _stream.Read(_internalBuffer, 0, _internalBuffer.Length);
but, as timer isn't stopped, we get continued timer hits (invocations of
SocketTimerElapsed) which consume CPU power without doing any useful work.
2. Concurrency prevention is insufficient in SocketTimerElapsed. It is entirely
possible (and has been seen in the wild, too) that:
(a) timer is hit and a thread[1] is allocated to call SocketTimerElapsed callback
(b) thread[1] starts executing the callback but
(c) gets rescheduled by the OS after executing condition check at
433: if (!_updating)
but before executing
437: _updating = true;
This opens a race window for another thread[2] (next timer hit) to execute
Update() concurrently and possibly out of order which would be disastrous.
3. When connection is closed/reset by the client an exception is raised:
483: throw new System.IO.IOException("The underlying socket has been closed.");
but is later caught by SocketTimerElapsed which takes no action in such case.
So, what happens is: logic keeps on trying to read data from the socket that
has been shut down by peer (thus consuming CPU power).
4. Timer interval of 1ms causes even higher CPU usage on Linux. This is due to
Mono
actually being able to provide 1 millisecond timer resolution whereas Windows .NET
provides 10 ms resolution.
In other words, DefaultSocketTimerLength of 1 is practically 10 on Windows
(100 timer hits per second). On Linux, however, it's 1 so we get close to 1000
timer hits per second [10x as many -- sic!]
Best way to deal with these issues is dropping the timer logic completely and
creating one, dedicated thread for handling socket input (per Connection).
We've started working on said implementation.
In the mean time, see attached (and kludgy) diff (against r595) that mitigates
described issues.
Test code drop is also available:
http://darkswarm.org/hfm-net-0.9.1.595-tear3.zip
Original comment by kszy...@gmail.com
on 18 Apr 2013 at 6:45
Attachments:
Original issue reported on code.google.com by
phils1...@gmail.com
on 15 Sep 2012 at 1:12