Open haraldsteinlechner opened 5 years ago
@haraldsteinlechner thanks for taking the time to file this bug report; I'll get to it as soon as I get back to civilization (in a couple of days) as I'm currently traveling with limited internet access. I think we will find the source of contention in the buffer manager.
just a quick update on problem 1: we further invastigated and rewrote suave internals in order to use the new socket.receiveAsync(buffer)
(available in netstandard2 and net471) instead of the eventargs stuff which it currently uses. however we could not fix the problem in suave yet. somehow still there need to be thread pool threads getting blocked during operation. In effect the problem looks like this one: http://joeduffyblog.com/2006/07/08/clr-thread-pool-injection-stuttering-problems/
i started sicussion here https://gitter.im/dotnet/coreclr although i'm not sure this is the right place.
Hi @haraldsteinlechner
I have been playing with your POC for a few days now and I here is what I have found so far.
With regard to Problem 2
Using many WebSocket connections, beginning from approx 10 take forever.
I do not see exactly this. Answering to 32 clients takes about 4.5 seconds but then stays pretty linear after increasing the number of clients.
startup for 9 clients took: 259.6000000194181ms.
startup for 16 clients took: 787.9999999713618ms.
startup for 18 clients took: 3755.299999989802ms.
startup for 32 clients took: 4803.599999984726ms
startup for 64 clients took: 4787.400000001071ms.
startup for 128 clients took: 4965.800000005402ms.
startup for 232 clients took: 5023.999999975786ms.
The why it does this does not seem to reside within Suave; as far as I can see the time is spent waiting for accept
to return. I rewrote the accept loop in its own dedicated thread with no F# asyncs and it exhibited the same connection time signature; it did so also after using plain blocking accepts.
As to Problem 1, I have managed to get rid of the long round trip in my tree, I am now figuring out which one of my changes did it. I'll get back to you on this.
cool. hope you can find out problem 1 :)
OK, the source of problem 1, the stuttering, is in the actual POC.
Changing the following line in the definition of observer.
System.Threading.Thread.Sleep 50
by
do! Async.Sleep 50
makes the periodic long round-trips go away (except for the first one). Which makes sense because Thread.Sleep is actually blocking a thread.
Does that work for you ?
Hey, thanks for the update, but I couldn't find the code you're referring to. The only call to Thread.Sleep I could find is in aFewTimes
which should only be relevant for startup.
Did I miss anything?
@krauthaufen i think the reference is for this one:
let observer =
async {
do! Async.SwitchToNewThread()`
And indeed you're right, this observer thread (hidden behind task) increases likelyhood of the stalls. I tried so much but if i recall correctly it stilll happened in long test runs over night. In the lack of a better small test case, let us assume, a second, blocking task is the problem.
SwitchToNewThread
. I now switched to a dedicated background thread and the problem goes away. Also by using low level diagnostics we found that SwitchToNewThread threads are still tagged as thread pool threads. So this totally matches the observation - we might have bad effects on the thread pool here. Actually we looked at all our libraries and replaced SwitchToNewThread by a dedicated real thread. The conclusion is: one shall not block threadpol threads whatever happens and SwitchToNewThread ends up in the threadpool as well somehow.
long running
. For normal TPL parallel programming however, i cannot estimate if tasks are long running a-priori. In order not to get unfair scheduling and stalls in other parts of the program i cannot use tasks at that point. In the last consequence, i cannot use any kind of threadpool/tpl stuff in presence of components such as webservers written in async/await style in order not to disrupt their responsiveness.Your analysis and our investigations show that we kind of have a deeper problem here. Just to give you some more info:
nrOfCores
parallel async awaits happen.In case of a deeper problem (which i suspect) i'm sorry we bothered you in this issue.
What do you think?
@stefanmaierhofer @krauthaufen @aszabo314 @thomasOrtner
EDIT: quick update, there is definitely a bug in SwitchToNewThread
on dotnet core. this caused the test to produce the stutters. in our real application we got rid of this but still suffer occasional stalls however.
Yeah you are right that we may be hitting a deeper problem.
Following the pattern here I rewrote observer
like this
async {
while true do
let mutable workers, ios = 0,0
System.Threading.ThreadPool.GetAvailableThreads(&workers,&ios)
printfn "%03.2f available workers: %A, io's: %A" sw.Elapsed.TotalMilliseconds workers ios
do! Async.SwitchToNewThread()
System.Threading.Thread.Sleep 50
do! Async.SwitchToThreadPool()
}
But it still does not solve the problem.
Hi, we figured out that SwitchToNewThread does not actually create a new thread in the netstandard build of fsharp.core. it just uses a regular threadpool thread instead. This is caused by FX_NO_THREAD being defined in the netstandard build of fsharp. (no idea why)
Nonetheless Tasks are not scheduled in a fair way and therefore these delays cannot be completely avoided.as an example we simply ran a parralel.for loop which computes something while requesting something from suave. That way we were able to essentialy block it. requests took up to a minute to complete.
Shouldn't suave use a real lightweight thread library instead of TPL? The main goal of a webserver is to serve its clients in a fair way and not (as in TPL) optimize throughput. I don't actually know if this is even possible since all low-level things return tasks... However it would be worth a thought, since something like image processing using parallel primitives shouldn't prevent the webserver from answering requests.
Cheers
some news. the accpet performance is most likely the same problem. in this case suave's client tasks steal work pool threads from the system, making it stoned for new clients. This stems from the fact that scheduling is not fair here. Thus existing ping pong connections block their pool threads and we need to wait until accept continuation can be executed The problem manifests more drastically in the use of Task and ThreadPool stuff as work horse for two purposes:
let testAsyncMadness () =
let tcp = TcpListener(IPAddress.Loopback, 1000)
tcp.Start()
let server =
let f () =
tcp.AcceptSocket() |> ignore
printfn "accepted"
()
Thread(ThreadStart f, IsBackground = true).Start()
let ppi () =
let f () =
let mutable sum = 0.0
let steps = 2000000000
let step = 1.0 / float steps
let o = obj()
let sw = System.Diagnostics.Stopwatch.StartNew()
// dotnet provides TPL, somewhere i use a lib which uses TPL for parallel computing
Parallel.For(0,steps, Func<float>(fun () -> 0.0), Func<int,ParallelLoopState,float,float>(fun i state local ->
let x = (float i + 0.5) * step
local + 4.0 / (1.0 + x * x)
), Action<float>(fun local -> lock o (fun _ -> sum <- sum + local))) |> ignore
printfn "computed pi: %A and took: %As" (step * sum) sw.Elapsed.TotalSeconds
let t = System.Threading.Thread(ThreadStart f)
t.Start()
System.Threading.Thread.Sleep 100 // get it goin
t
let background = ppi()
let sw = System.Diagnostics.Stopwatch.StartNew()
let client = new TcpClient()
// here i use a different facet of tasks, acutally for concurrency (here on main thread). here the continuation
// fights for threadpool of the parallel programming part
client.ConnectAsync(IPAddress.Loopback,1000).ContinueWith(fun (t : Task) ->
sw.Elapsed.TotalSeconds |> printfn "connecting tcp listener took: %A (%As)" t.Status
) |> ignore
background.Join()
tcp.Stop()
(*
oops, both library parts do not compose in terms of performance:
> testAsyncMadness();;
accepted
connecting tcp listener took: RanToCompletion (took 1.0295347s) (!!!)
computed pi: 3.141592654 and took: 3.0281443s
val it : unit = ()
*)
Clearly, both parts of the program do not compose. please note that the low level implementation using FromBeginEnd
behaves fine.
Also, with real threads or proper lighweight threads the expected performance would be:
> testAsyncMadness2();;
accepted
connecting tcp listener took: 0.0329048s
computed pi: 3.141592654 and took: 3.0007865s
val it : unit = ()`
So in conclusion, if a thread pool thread, for whatever reason, e.g.
is blocked, continuations may be blocked for an arbitrary amount of time (typically a multiple of 500ms, being thread injection interval).
I encountered two (most likely independent) performance problems with suave websockets. A setup which demonstrates the problem can be achieved as such:
lurkers
lurkers
).In this setup here the WebSocket example has been modified in order to show the problem in a consice manner. For demostration, use the test repo, start the modified websocket example and open
Blub.html
available in the Websocket directory in a web browser. In your chrome you will see:Problem 1:
As you will see, typically the ping pong takes 0.21ms, however suddenly the application starts printing long iteration warning - showing more than 500ms roundtrip time. In the output log you will see stuff as such:
On a machine with more cpus or setting higher MinThreads in the threadpool this problem goes away. We will shortly submit an issue to coreclr @krauthaufen
The problem is simply the way async operations work on sockets in combination with varying workloads as simulated by the two kinds of sockets.
Problem 2:
Using many WebSocket connections, beginning from approx 10 take forever. You can see the connection time for 32 clients (as you will see in Blub.html) in the screenshot which is about 5seconds-20secons.
This is most likely due to problems in concurrent buffer management.