dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.08k stars 4.69k forks source link

SerialPort - high CPU usage #2379

Open michaldobrodenka opened 4 years ago

michaldobrodenka commented 4 years ago

Tested on Linux. I have a embedded application where I have a separate thread with sync Read/Write operations in loop and after migrating from mono, this thread CPU usage went from 5% to 25%.

Typical use case for SerialPort like ModBus with small poll intervals, where serial port sends & recieive small packets many times per second make .NET Core Serial consume a lot of resources.

See: https://stackoverflow.com/questions/54903630/net-core-3-0-system-io-ports-serialport-uses-5-10-cpu-on-rpi-all-the-time

jkotas commented 4 years ago

cc @wfurt

michaldobrodenka commented 4 years ago

When I look at the implementation, for every simple blocking read/write operation - multiple objects on heap are created: SerialStreamIORequest, CancellationTokenSource for timeout or ReadTask/WriteTasks.

When you consider high performance scenario, like modbus on 115200 line where you send and receive a few bytes like 50 times per second, pressure for GC and object creation is not really light.

Maybe we can improve code by checking in Read/Write if other IO operation is running and if not, complete synchronously. But then we still have to deal with timeout. I'm not an OS expert, but probably using linux 'select' would be more efficient than creating CancellationTokenSource object 100 times per second. But I understand that this is probably not a priority for now.

krwq commented 4 years ago

@michaldobrodenka it's not that simple, I agree we could reduce CTS creation here but I doubt this would reduce any high CPU usage or have any visible impact.

As for using select - we are currently using poll which in a sense is very similar but the problem is we cannot just wait until it timeouts because SerialPort was designed such that it also supports events which mess up the perf capabilities really bad (perhaps one solution would be to use different code when there are events vs when there are none). I suspect the error here is that we're waiting for data but we're not getting any and we're trading off quick response time with CPU usage here. I.e. if we slept here then OS will wait minimum of 10ms which would reduce any responsiveness, I suspect we should do some kind of SpinWait in a loop which would give us something in between looping and sleeping - this will need to be investigated a bit more to not regress any scenarios.

michaldobrodenka commented 4 years ago

I solved this by extracting serial port from Mono. https://github.com/michaldobrodenka/System.IO.Ports.Mono

jeroenwalter commented 4 years ago

I'm using the serialport on a Rpi3 connected to an Arduino via usb. The arduino is sending continuous data to the serialport. Even just reading the data without doing anything with it via the DataReceived event results in about 50 - 80 threads consumed by the serialport, all waiting for some other serialport thread. Data is being received seconds after they are sent. The cpu load generated by this implementation is much too high for serious applications. When I saw the implementation I was baffled by all the async/await stuff, even for reading a single byte. Please consider completely rewriting this class, as now it is just a cpu hog and useless for serious data throughput use on rpi and similar boards.

jeroenwalter commented 4 years ago

I solved this by extracting serial port from Mono. https://github.com/michaldobrodenka/System.IO.Ports.Mono

I've just implemented a DataReceived event for your implementation. cpu load is 2% on rpi vs 25% with the dotnet serialport implementation. VM-size also doesn't keep growing, as it does with the dotnet implementation.

janvorli commented 4 years ago

DataReceived event results in about 50 - 80 threads consumed by the serialport

This sounds pretty bad. One of the overheads (obviously not the only one) caused by this is that so many threads result in a lot of virtual memory consumption due to thread stacks. It is a problem especially on 32 bit systems where the virtual address space is very limited. @krwq, @wfurt, do we understand why would so many threads be created?

jeroenwalter commented 4 years ago

I can confirm that the virtual memory consumption of the application rose to about 1GB within a matter of minutes on the Rpi... using the dotnet serialport class that is. The other implementation from michaldobrenka doesn't have this problem, VM size doesn't grow.

jeroenwalter commented 4 years ago

@krwq, @wfurt, do we understand why would so many threads be created?

Just take a look at https://github.com/dotnet/runtime/blob/master/src/libraries/System.IO.Ports/src/System/IO/Ports/SerialStream.Unix.cs

My guess is because every call to ReadByte(), which you typically do in a loop in the DataReceived event, results in a Task being created and awaited. This takes some time, in the meantime new data can arrive, which results in firing the DataReceived event via RaiseDataReceivedChars using a ThreadPool.QueueUserWorkItem. Now, if you use a lock in the DataReceived handler, because you don't want to read the serialport from 2 threads simultaneously, then all other DataReceived events have to wait until you release the lock. This will make all those worker threads wait.

sandervandegeijn commented 3 years ago

I'm probably having the same issue, seeing 30-50% cpu = half a core on Linux. On Windows however with exactly the same code, cpu usage is doing next to nothing. There is a performance gap between my Linux box (nuc with 4 core celeron) and my Windows machine (I5 9600), but this is significant enough.

At first I suspected my own implementation due to many tasks, but refactored it, think that's okay now but still having issues. When I increase the load on the serial bus it gets worse.

I'm using easymodbustcp as the modbus lib for a .net5 console app, can anyone give me some pointers on how to swap out the default implementation withthe mono one?

Found another thread: https://stackoverflow.com/questions/59577676/c-sharp-serialport-multithread-send-receive-collisions

sandervandegeijn commented 3 years ago

Okay performance is definitely terrible on Linux... On Windows it's fine. How can we get this higher on the backlog, this is barely usable, even sending 3 modbusrequests / second at 9600bps is eating up a lot of cpu time while doing some basic string manipulations. I'm not capable of fixing this library myself.

jeroenwalter commented 3 years ago

Okay performance is definitely terrible on Linux... On Windows it's fine. How can we get this higher on the backlog, this is barely usable in, even sending 3 modbusrequests / second on 9600bps is eating up a lot of cpu time while doing some basic string manipulations. I'm not capable of fixing this library myself.

I ran into the same issues, also addressed them some posts back, but no feedback from the dotnet team unfortunately. You may take a look in my solution in https://github.com/jeroenwalter/Arduino/tree/netcore30/System.IO.Ports.Mono My Arduino solution uses an implementation that I found on https://github.com/michaldobrodenka/System.IO.Ports.Mono It works well on the raspberry pi 3.

janvorli commented 3 years ago

@wfurt is this something you can look into fixing?

wfurt commented 3 years ago

possibly. I'm mostly focused on other areas right now but the work needed may not be too large. I'll chat with @krwq about plan.

It would be great if people could share simple projects so we have good baseline how this is used. I can make some guesses from the discussion but having some code easy to run and measure would be nice.

hrocha1 commented 3 years ago

It seems like the common use case is Raspberry Pi communicating with microcontrollers. So the "minimum sample project" might probably be something like Arduino talking to Raspberry via serial? There is a decent chance that many people have that hardware available and are able to test potential changes.

wfurt commented 3 years ago

I have few RPi and I used it with serial USB connected to my x64 VM. I'm mostly interested in coding patterns. For example ReadByte() can be greatly simplified but I'm not sure how much that matters.

sandervandegeijn commented 3 years ago

It seems like the common use case is Raspberry Pi communicating with microcontrollers. So the "minimum sample project" might probably be something like Arduino talking to Raspberry via serial? There is a decent chance that many people have that hardware available and are able to test potential changes.

Not only for those home/hobby projects. Serial is still relevant, especially for industrial applications. I'm using a Intel NUC as a modbus master with a CH340G/RS485 USB stick to communicate with multiple energy meters, solar inverters and charge controllers. Still in a home situation, but a lot of industrial appliances / sensors use Modbus RTU (which mainly runs over RS232/RS485).

jeroenwalter commented 3 years ago

It seems like the common use case is Raspberry Pi communicating with microcontrollers. So the "minimum sample project" might probably be something like Arduino talking to Raspberry via serial? There is a decent chance that many people have that hardware available and are able to test potential changes.

I use my own .netcore fork of the SolidSoils Arduino library on a rpi3. The rpi is not sending commands to initiate data transfer. Instead the Arduino is constantly sending firmata messages to the rpi. The SolidSoils code (my fork as well as the original SolidSoils lib) uses the ReadByte() method in the DataReceived event to parse firmata messages. See the method SerialDataReceived() in https://github.com/jeroenwalter/Arduino/blob/netcore30/Solid.Arduino.Firmata/FirmataSession.cs or in https://github.com/SolidSoils/Arduino/blob/master/Solid.Arduino/ArduinoSession.cs

For event driven serial communication like this, I think ReadByte() for a single byte should be as fast as possible, no async tasks or enqueued action or such, just read the byte. In this case you can only read 1 byte at once, as you don't know how many bytes are available and can't wait for extra bytes to arrive before processing a complete firmata message.

sandervandegeijn commented 3 years ago

Okay, on Windows with .net 5.0 it isn't fully stable either. Although I'm not noticing the high cpu usage on my Windows 10 boxes it does trigger a dirty error;

Unhandled exception. info: Modbus2Mqtt.Eventing.ModbusResult.ModbusRegisterResultHandler[0]
      Result for: eastron_test register: reactive power : 0
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

I can reproduce this quite easily.

Priyantha commented 3 years ago

Okay, on Windows with .net 5.0 it isn't fully stable either. Although I'm not noticing the high cpu usage on my Windows 10 boxes it does trigger a dirty error;

Unhandled exception. info: Modbus2Mqtt.Eventing.ModbusResult.ModbusRegisterResultHandler[0]
      Result for: eastron_test register: reactive power : 0
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

I can reproduce this quite easily.

During running the same piece of tooling, in this case something to communicate with Modbus, I am experiencing the same issue as @ict-one-nl.

Alex-111 commented 3 years ago

Maybe the eventloop in the serialportstream could be started lazyly. In most use cases only reading and writing to the stream is necessary. I do not understand why the loop is also started when just reading from the stream without caring about the events

krwq commented 3 years ago

I did some quick investigation and have couple of ideas

  1. The loop in the Unix stream (SerialStream) shouldn't start unless unless you subscribe to the event. Also it should stop if nothing is happening for certain amount of time. I think the problem might be that SerialPort class always subscribes to the events: https://github.com/dotnet/runtime/blob/main/src/libraries/System.IO.Ports/src/System/IO/Ports/SerialPort.cs#L603-L604 and other related piece here https://github.com/dotnet/runtime/blob/main/src/libraries/System.IO.Ports/src/System/IO/Ports/SerialStream.Unix.cs#L799 I think this shouldn't be happening at least not on linux. I think I might have regressed it here: https://github.com/dotnet/runtime/commit/98361c13a0d5d71c412aece04ac9c84bda316a31#diff-efd183ff76a828527dffa9f7370a0459922121146a211b15fd9a4863d4be70d5
  2. Regardless of events or whatever is happening there loop shouldn't take 100% CPU cycles unless the read/write queue is full. Not sure why it's happening yet. If nothing is happening in the loop we should be calling Thread.Sleep which should bring usage to zero.
  3. We should probably investigate into bypassing the loop for simple IO operations. So as long as you don't use async or events the loop shouldn't start at all.
  4. Perhaps if there are many serial ports used by app we should consider sharing the loop with other ports to save some resources - not sure if that is an actual problem.
  5. Improve events:
    • We should investigate if Linux APIs allow better way to handle events asynchronously
    • If above can't be done we should investigate if dropping event support (or at least using different code when you're not subscribed) wouldn't be better solution - lack of events makes to code much simpler and allows do do everything much more efficiently

I think we should investigate it one problem at a time.

sandervandegeijn commented 3 years ago

Is there any activity on this? It's really terrible, the current implementation even triggers locking up cpu cores.

krwq commented 3 years ago

@ict-one-nl unfortunatelly not on our focus at the moment. We will most likely need some help from community to move this forward faster. From my end I can only ask to focus on this in vNext but I can't promise this will happen.

Alex-111 commented 3 years ago

Couldn‘t you just implement the eventloop lazyly. Most users do not need the events. So at least for the majority of users the serialport is usable then. Othwise it is useless on linux...

sandervandegeijn commented 3 years ago

Well it's so bad that I think it should be considered to drop serial support for the Linux builds all together for now. That way there are no false expectations... The risk is that it never gets fixed at all, but still.

I'm seeing more than 35-50% cpu and occasional system instability with very mild loads.

dlukach commented 3 years ago

is there any updates on this? This is really impacting a commercial product I'm developing. Tried to use the mono ports posted above but it is locking up on the reading of a serial line. The .net 5 implementation of ports is killing my RPI from functioning properly. If there is any way my team can assist in expediting this issue please let me know.

dlukach commented 3 years ago

So the issue with the mono port, it defaults to "\r\n" for new line when it should be "\n" like the .net implementation. I didn't look why since the default value is "\n" yet the private var doesn't have that set

krwq commented 3 years ago

@dlukach for \r\n issue please create a new issue so this won't get missed. As for updates... well I didn't really have time to pick this up 😞

sandervandegeijn commented 2 years ago

Now that .net6 has left the building (great job!) can you guys to spare some time to fix this? It would be more correct to simply drop serial port as it is now entirely until it's fixed than leaving as it is now.

sandervandegeijn commented 2 years ago

I really hope this gets fixed in 7, basically you could throw a notimplemented exception tbh ;)

markus-fischbacher commented 2 years ago

I'm facing the same bad performance on my Linux embedded system. I have to deal with two UARTs in parallel. CPU usage approx. 80%. On MacOS it's working better (maybe because of more CPU power). I will try the Mono implementation.

sicode commented 2 years ago

Okay, on Windows with .net 5.0 it isn't fully stable either. Although I'm not noticing the high cpu usage on my Windows 10 boxes it does trigger a dirty error;

Unhandled exception. info: Modbus2Mqtt.Eventing.ModbusResult.ModbusRegisterResultHandler[0]
      Result for: eastron_test register: reactive power : 0
System.NullReferenceException: Object reference not set to an instance of an object.
   at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

I can reproduce this quite easily.

Yes,When Cpu uage 90% ,100% crash

Alex-111 commented 1 year ago

If there is no fix available in the near future, I suggest to include the mono code instead of the current one….

kasperk81 commented 1 year ago

@Alex-111 try the latest one https://www.nuget.org/packages/System.IO.Ports#versions-body-tab (8 preview3)

Alex-111 commented 1 year ago

@kasperk81 Why do you suggest this? Can you explain, what has been changed to reduce CPU usage?

jeroenwalter commented 1 year ago

@kasperk81 Why do you suggest this? Can you explain, what has been changed to reduce CPU usage?

https://github.com/dotnet/runtime/blob/main/src/libraries/System.IO.Ports/src/System/IO/Ports/SerialStream.Unix.cs

As far as I can see, the implementation didn't change much and is still using Tasks and queuing threads anytime a byte arrives on the serial port. So my guess is that nothing has been improved performance-wise on a Raspberry Pi. Would be nice if the .net team does some performance testing / benchmarking vs old Mono implementation on non-Windows platforms.

sandervandegeijn commented 1 year ago

I gave up on this, it's completely unusable. Throwing a not implemented exception would be cleaner than leaving this **** in.

michaldobrodenka commented 1 year ago

If you want I can share mono serial port implementation rewritten to pure c# + libc dllimport. (so it's not dependent on monoposixhelper native libraries)

schotime commented 1 year ago

@michaldobrodenka This would be fantastic. I'm struggling with deadlocks bigtime with the MS one.

michaldobrodenka commented 1 year ago

@schotime this is source I use heavily on low end ARM machines (I made some namespace changes, hope it will work :) ) https://github.com/michaldobrodenka/Mono.IO.Ports-managed

schotime commented 1 year ago

@michaldobrodenka It works!!!! Thankyou! I'm using it on ARM PI like devices. Hopefully these rotten errors can be gone.