OPCFoundation / UA-.NETStandard

OPC Unified Architecture .NET Standard
Other
1.96k stars 946 forks source link

Task.Run() performance issue #291

Closed ghost closed 5 years ago

ghost commented 6 years ago

Hi all! I was doing some experiments with the .NET SDK in order to evaluate it's performance for an industrial use case. I found that in very rare cases (~0.01%), a read request is delayed for ~100-1000ms. However, in my case even such rare performance issues are quite important. I found that this delay is caused by Task.Run() in the ScheduleIncomingRequest method in the ServerBase class. I tried creating a larger thread pool by using ThreadPool.SetMinThreads(8, 8);. This solution greatly reduced the delays. Now I get less delays, and mostly up to 50ms.

The performance is now much better, but still this seems to be the main cause of longer response time. Do you have any suggestions on how to reduce these delays even more? Do you think it would be possible/acceptable to completely omit running a new task for request processing?

Best, Luka

mregen commented 6 years ago

@LukaLednicki Hi this is an interesting finding which requires some deeper investigation. The Task.Run cannot be removed because the incoming requests execute in parallel. To drill down a bit it would good to find the root cause for the delay. Could you describe your test setup to repro?

ghost commented 6 years ago

@mregen I used the .NET SDK to implement a test .NET Core server and client. The server runs either on a PC or Raspberry Pi, and the client runs on a PC. I started with just measuring RTT of a read value operation from the client side. Depending on the network, the read is repeated 100 000 - 1 000 000 times. I tried this both with no delays between reads and with up to 25ms delay between each read. In less than 0.1% of cases, I get an exceptionally long RTT, up to 1s. This doesn't seem to be dependent on the platform (PC or Raspberry Pi), the network configuration (loopback adapter, Ethernet, Wireless), or the delay between reads. I added some measurement in the code, and found that a long delay is almost always waiting the request processing task to start.

As mentioned, increasing the thread pool seems to help a lot, and reduced the delays to mostly under 50ms. However, still almost always a long RTT seems to be caused by Task.Run() in the ScheduleIncomingRequest method of the RequestQueue class defined in ServerBase.cs.

My question about removing the new task altogether was just to try it in my specific case. I understand that then the tasks couldn't be handled in parallel (and would probably have a longer average execution time), but currently the limiting factor is this worst case (which is many times longer than average). Would the rest of the stack be able to run properly if the requests are not processed each in a new task?

Ex. of measured delay (the text denotes classes and methods which are started/called):

0ms - TcpListener.OnRequestReceived start 0ms - EndpointBase.BeginProcessRequest2 0ms - ServerBase.ScheduleIncomingRequest 0ms - RequestQueue.ScheduleIncomingRequest 608ms - RequestQueue.ScheduleIncomingRequest.m_server.ProcessRequest.Call 608ms - ProcessRequestAsyncResult.OnProcessRequest 608ms - SessionEndpoint.Read 608ms - StandardServer.Read 608ms - MasterNodeManager.Read 608ms - CustomNodeManager.Read 608ms - NodeState.ReadAttribute 608ms - BaseVariableState.ReadValueAttribute 608ms - RequestQueue.ScheduleIncomingRequest.m_server.ProcessRequest.Returned

MichaelVach commented 6 years ago

@LukaLednicki perhaps playing with the TaskCreationOptions could help. Replace the existing code

Task.Run(() => { m_server.ProcessRequest(request); });

with something like

Task.Factory.StartNew(() => { m_server.ProcessRequest(request); }, TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);

and see if it makes a difference.

elpht commented 5 years ago

I've stumbled upon similar issues with a commercial OPC UA driver, in our case writing values. I decompiled their code and I guess they built it on top of a similar OPC UA stack implementation.

I have a little experience dealing with threading issues and I think I can explain why this happens: You are already using async HttpClient calls but you wrap it into sync methods and for each call you send to the server it needs two tasks: one that you spawn to begin the async communication using the HttpClient (whose execution is suspendend as soon as the payload is sent and resumed once the server response arrives) and another to wait for the response arriving to the former task.

This means that at the moment the server response arrives and the wait handle to notify its reception is set, the application needs two threads from the thread pool, while the rest of the time one thread is blocked doing nothing, just awaiting the signal.

The show starts when you have many parallel requests (read/write to the same or many PLCs) and for many of them the response from the server arrives at the same time. Then your application suddenly finds itself wanting to double the amount of threads, the thread pool algorithm to create new threads when the pool is exhausted is not so responsive ("The thread pool is designed to maximize throughput, not to minimize latency" https://blogs.msdn.microsoft.com/oldnewthing/20170623-00/?p=96455) and that is what provoke those "unexpected" delays.

In my opinion the only way to get rid of it is to make communication to the server fully async. This would be a significant performance improvement because there are no time consuming calculations required for the client-server communication (as far as I know) and, the most of the time, threads are just acquired and blocked waiting for server responses.