dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.3k stars 4.74k forks source link

race condition during SerialPort.Close in System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents #44952

Open vsfeedback opened 3 years ago

vsfeedback commented 3 years ago

This issue has been moved from a ticket on Developer Community.


.NET Core 3.1

A NullReferenceException occurs very regularly when my code calls SerialPort.Close because the attached device is (by design) sending UART traffic so SerialStream.CallEvents is called.

In SerialStream.CallReceiveEvents, stream. DataRecevied is checked at line 1888 on a ThreadPoolWorkQueue thread but if that thread goes to sleep after 1888 is evaluated and my "main" thread calls SerialPort.Close or otherwise unregisters my DataReceived callback before ThreadPoolWorkQueue thread resumes, the stream. DataReceived dereference at line 1890 or 1892 will cause the NullReferenceException.


Original Comments

Charlie Arcuri [MSFT] on 11/18/2020, 11:37 AM:

NOTE: The line numbers in the analysis in the original post refer to .NET Framework source because that was easy to hyper-link to in the explanation. I have downloaded .NET Core symbols and source and the analysis holds true even though line numbers are different.

Feedback Bot on 11/19/2020, 00:26 AM:

We have directed your feedback to the appropriate engineering team for further evaluation. The team will review the feedback and notify you about the next steps.


Original Solutions

(no solutions)

tarekgh commented 3 years ago

Here is the right link of the code in .NET Core https://github.com/dotnet/runtime/blob/5b48caed15190568f3573cc9a1f96bfec357b485/src/libraries/System.IO.Ports/src/System/IO/Ports/SerialStream.Windows.cs#L1772

carlossanlop commented 3 years ago

@CharlieArcuri I assume you logged the original issue. Is this a blocker for you? Do you have a code snippet you can share with a repro? Is this something we could try to repro with a normal serial cable with two usb adapters connected to the same computer, or maybe a serial port loop back adapter? cc @krwq

krwq commented 3 years ago

simple fix could be to store DataReceived into local first and then operate on the local but we should look into what will happen in the user code (can/should they get the event after the dispose?). I generally do not recommend people to use DataReceived as there are also different ambiguities and issues with that (i.e. what should happen when callback is being executed for longer and another event happens during that period, should you get another event or should it wait?).

I suspect this should repro with any setup but it might not be trivial to hit without many retries

charliearcuri commented 3 years ago

@carlossanlop , yes it is a blocker for us. I will circle back with a code snippet after I do some work to get the minimal repro.

Getting it to repro with a normal serial cable seems cost prohibitive from my point-of-view but I propose the following alternatives: 1) I can share a VS2019 remote of my set-up and/or otherwise set-up a "joint debugging" session. 2) Is there an equivalent of "Time Travel Debug" tracing I could capture? 3) How about logging? 4) Is my code inspection correct (that it appears be - based solely on code inspection - a race condition?)

Regarding @krwq's comments, if the recommendation is to avoid DataReceived, shouldn't DataReceived be marked as "deprecated" with a pointer to a supported method? What is the supported/recommended alternative? I do have to run through several hundred iterations to get a repro - but I'm expecting the framework to be stable such that it wouldn't have an internal race condition that would manifest as a NullReferenceException-related crash so that my domain-specific test automation is reliable/never produces false positive failures/etc.

I'll circle back before end-of-day with what I come up with for a "minimal repro." It will be based on a proprietary "debug card" we use in Surface - it's based on an FTDI chip - the serial communication is also a proprietary (relatively simple) protocol we use in Surface devices. Let me know if the repro using the standard issue serial cable is greatly preferred (or if we can proceed with logging/joint debugging/etc.)

Thanks for the help!

krwq commented 3 years ago

@charliearcuri I'd recommend to use just Read/Write APIs (simply read and wait for result/timeout). The DataReceived event can run on any thread and there are consequences of that (doing operation on main thread and in the event may cause some weird behaviors and I think we do not guarantee any thread-safety on instance methods). It probably should be deprecated because it's a pitfall, we'd need to investigate this a bit more though (also other factor is if there is many people using this API might be a good idea to fix this but from my experience there are just issues with these kind of events).

For time travel debugging, have you tried https://devblogs.microsoft.com/dotnet/debugging-net-apps-with-time-travel-debugging-ttd/ ? I'm not sure you'll still be able to repro if you setup like that though (the app will become slower and the debugger might affect the repro).

The easiest way to verify would likely be to simply apply the fix locally, rebuilt IO.Ports and see if you're still able to repro. Also if there are threading issues in that code I'd assume there might be more than one so there is no guarantee this will 100% help.

fatihyildizhan commented 2 years ago

Hi, I upgraded .NET5 project to .NET6 and SerialPort.Close() throws an exception.

.NET6 comes with System.IO.Ports 6.0

  System.NullReferenceException
  HResult=0x80004003
  Message=Object reference not set to an instance of an object.
  Source=System.IO.Ports
  StackTrace:
   at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state)
   at System.Threading.QueueUserWorkItemCallback.<>c.<.cctor>b__6_0(QueueUserWorkItemCallback quwi)
   at System.Threading.ExecutionContext.RunForThreadPoolUnsafe[TState](ExecutionContext executionContext, Action`1 callback, TState& state)
   at System.Threading.QueueUserWorkItemCallback.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
sicode commented 2 years ago

Same exception

Exception object: 00000000030c5848 Exception type: System.NullReferenceException Message: Object reference not set to an instance of an object. InnerException: StackTrace (generated): SP IP Function 000000002E30F1C0 000007FE8CB36F49 System_IO_Ports!System.IO.Ports.SerialStream+EventLoopRunner.CallReceiveEvents(System.Object)+0xffc1fe59 000000002E30F240 000007FEE96572C9 System_Private_CoreLib!System.Threading.QueueUserWorkItemCallback+<>c.<.cctor>b__6_0(System.Threading.QueueUserWorkItemCallback)+0x5c7584f9 000000002E30F270 000007FEE9886FE8 System_Private_CoreLib!System.Threading.ExecutionContext.RunForThreadPoolUnsafe[[System.Canon, System.Private.CoreLib]](System.Threading.ExecutionContext, System.Action`1<System.Canon>, System.__Canon ByRef)+0x5c988298 000000002E30F2B0 000007FEE965720F System_Private_CoreLib!System.Threading.QueueUserWorkItemCallback.Execute()+0x5c75855f 000000002E30F2F0 000007FEE9656290 System_Private_CoreLib!System.Threading.ThreadPoolWorkQueue.Dispatch()+0x5c75ac30 000000002E30F390 000007FEE965DD8A System_Private_CoreLib!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()+0x14a 000000002E30F4A0 000007FEE964278F System_Private_CoreLib!System.Threading.Thread.StartCallback()+0x3f

juanlao commented 1 year ago

I have been dealing with this issue, or a very similar one. In my case I was subscribing to DataReceived event too many times. By only subscribing once, it solved the problem. The point here is that the object that manages the subscription was created in one thread, and the subscription was done in another thread.

Hope this helps :)

rd0220 commented 10 months ago

I am facing a similar issue. image Environment: .Net 8.0.101 System.IO.Ports 8.0.0 VS 2022 Version 17.8.4

Please let me know how to fix this issue

Below is the error

The active test run was aborted. Reason: Test host process crashed : Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object. at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() System.NullReferenceException: Object reference not set to an instance of an object. at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() System.NullReferenceException: Object reference not set to an instance of an object. at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() System.NullReferenceException: Object reference not set to an instance of an object. at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() System.NullReferenceException: Object reference not set to an instance of an object. at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() System.NullReferenceException: Object reference not set to an instance of an object. at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() System.NullReferenceException: Object reference not set to an instance of an object. at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()

rd0220 commented 10 months ago

Hi, I upgraded .NET5 project to .NET6 and SerialPort.Close() throws an exception.

.NET6 comes with System.IO.Ports 6.0

  System.NullReferenceException
  HResult=0x80004003
  Message=Object reference not set to an instance of an object.
  Source=System.IO.Ports
  StackTrace:
   at System.IO.Ports.SerialStream.EventLoopRunner.CallReceiveEvents(Object state)
   at System.Threading.QueueUserWorkItemCallback.<>c.<.cctor>b__6_0(QueueUserWorkItemCallback quwi)
   at System.Threading.ExecutionContext.RunForThreadPoolUnsafe[TState](ExecutionContext executionContext, Action`1 callback, TState& state)
   at System.Threading.QueueUserWorkItemCallback.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()

I am encountering the same issue after we move from .Net Framework 4.8 to .Net 8.0, Are you able to resolve this issue

pggh commented 7 months ago

I'm hit by the same error in CallReceiveEvents when detaching an handler from SerialPort serialPort.DataReceived -= OnDataReceived; when using "System.IO.Ports" Version="8.0.0" on Windows.

The source of CallReceiveEvents

if (stream.DataReceived != null)
{
    if ((nativeEvents & (int)SerialData.Chars) != 0)
        stream.DataReceived(stream, new SerialDataReceivedEventArgs(SerialData.Chars));
    if ((nativeEvents & (int)SerialData.Eof) != 0)
        stream.DataReceived(stream, new SerialDataReceivedEventArgs(SerialData.Eof));
}

is just wrong in multi-threaded context. Either use a temporary local copy of the event or use

    if ((nativeEvents & (int)SerialData.Chars) != 0)
        stream.DataReceived?.Invoke(stream, new SerialDataReceivedEventArgs(SerialData.Chars));
   ...