dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.08k stars 4.69k forks source link

Stackoverflow exception in `NetworkStream.DataAvailable` after migration do .NET 6 #95114

Closed daveMueller closed 10 months ago

daveMueller commented 10 months ago

Description

Hello, let me describe the scenario real quick. Our software can connect to several hardware video conferencing systems like e.g. Cisco TelePresence Quick Set SX20. The test case where this error happens is just a shutdown of the hardware. We are on the way to migrate our solution from .NET Framework 4.8 to .NET 6. The issue only happens with the .NET 6 version of our application.

image

Reproduction Steps

The code where this happens is similar to the following:

private byte[] ReadDataFromStream(INetworkStreamAdapter stream)
{
    if (!stream.DataAvailable) return Array.Empty<byte>();

    using (var memoryStream = new MemoryStream())
    {
        var bytesRead = 0;
        var buffer = new byte[GetReadBufferSize()];

        while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
        {
            memoryStream.Write(buffer, 0, bytesRead);
            if (!stream.DataAvailable) break;
        }

        return memoryStream.ToArray();
    }
}

Expected behavior

No application crash. No stackoverflow exception.

Actual behavior

The stackoverflow exception causes an application crash.

Regression?

It works without a problem in .NET Framework 4.8.

Known Workarounds

No response

Configuration

Other information

Maybe an additional check would be sufficient so that we don't try to read from the stream in the first place. But NetworkStream.DataAvailable seems to exactly be that check.

Any idea or workaround would be super helpful. ☺️

ghost commented 10 months ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Hello, let me describe the scenario real quick. Our software can connect to several hardware video conferencing systems like e.g. Cisco TelePresence Quick Set SX20. The test case where this error happens is just a reboot of the hardware. We are on the way to migrate our solution from .NET Framework 4.8 to .NET 6. The issue only happens with the .NET 6 version of our application. ![image](https://github.com/dotnet/runtime/assets/19378678/4ac72996-dc6d-40b9-b891-867ea734039c) ### Reproduction Steps The code where this happens is similar to the following: ```csharp private byte[] ReadDataFromStream(INetworkStreamAdapter stream) { if (!stream.DataAvailable) return Array.Empty(); using (var memoryStream = new MemoryStream()) { var bytesRead = 0; var buffer = new byte[GetReadBufferSize()]; while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0) { memoryStream.Write(buffer, 0, bytesRead); if (!stream.DataAvailable) break; } return memoryStream.ToArray(); } } ``` ### Expected behavior No application crash. No stackoverflow exception. ### Actual behavior The stackoverflow exception causes an application crash. ### Regression? It works without a problem in .NET Framework 4.8. ### Known Workarounds _No response_ ### Configuration * .NET 6.0.408 * OS Name: Microsoft Windows 10 Enterprise LTSC OS Version: 10.0.17763 N/A Build 17763 * x64 * No ### Other information Maybe an additional check would be sufficient so that we don't try to read from the stream in the first place. But `NetworkStream.DataAvailable` seems to exactly be that check. Any idea or workaround would be super helpful. :rel
Author: daveMueller
Assignees: -
Labels: `area-System.IO`
Milestone: -
ghost commented 10 months ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Hello, let me describe the scenario real quick. Our software can connect to several hardware video conferencing systems like e.g. Cisco TelePresence Quick Set SX20. The test case where this error happens is just a reboot of the hardware. We are on the way to migrate our solution from .NET Framework 4.8 to .NET 6. The issue only happens with the .NET 6 version of our application. ![image](https://github.com/dotnet/runtime/assets/19378678/4ac72996-dc6d-40b9-b891-867ea734039c) ### Reproduction Steps The code where this happens is similar to the following: ```csharp private byte[] ReadDataFromStream(INetworkStreamAdapter stream) { if (!stream.DataAvailable) return Array.Empty(); using (var memoryStream = new MemoryStream()) { var bytesRead = 0; var buffer = new byte[GetReadBufferSize()]; while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0) { memoryStream.Write(buffer, 0, bytesRead); if (!stream.DataAvailable) break; } return memoryStream.ToArray(); } } ``` ### Expected behavior No application crash. No stackoverflow exception. ### Actual behavior The stackoverflow exception causes an application crash. ### Regression? It works without a problem in .NET Framework 4.8. ### Known Workarounds _No response_ ### Configuration * .NET 6.0.408 * OS Name: Microsoft Windows 10 Enterprise LTSC OS Version: 10.0.17763 N/A Build 17763 * x64 * No ### Other information Maybe an additional check would be sufficient so that we don't try to read from the stream in the first place. But `NetworkStream.DataAvailable` seems to exactly be that check. Any idea or workaround would be super helpful. :rel
Author: daveMueller
Assignees: -
Labels: `area-System.Net.Sockets`, `untriaged`
Milestone: -
ManickaP commented 10 months ago

Could you share your exact code (or more similar code) that corresponds to the callstack in that print screen? As I see you have BeginRead on there and I got a suspicion that you call again and again BeginRead from the callback, e.g.: BeginRead(callback) -> callback -> BeginRead(callback). In case the callback is processed synchronously, you'll have the whole while loop unwind on the callstack.

It seems similar to https://github.com/dotnet/runtime/issues/69269

wfurt commented 10 months ago

Also if you dealing with handles, you can perhaps try to check for IsClosed and IsInvalid. It seems like our code only checks disposal but I guess there can be race condition there.

daveMueller commented 10 months ago

Thanks @ManickaP and @wfurt and sorry for the late response. We got the following method here that starts reading from the stream with the callback DataReceivedInStream:

private void StartReadingFromStreamAsync(INetworkStreamAdapter stream)
{
    if (!stream.CanRead) throw new ConnectionException("Reading from stream is not possible.");
    stream.BeginRead(Array.Empty<byte>(), 0, 0, DataReceivedInStream, stream);
}

The callback kind of looks like this:

private void DataReceivedInStream(IAsyncResult result)
{
    if (!IsOpen) return;

    var stream = result.AsyncState as INetworkStreamAdapter;

    try
    {
        lock (m_ReadingLock)
        {
            if (stream != null) stream.EndRead(result);

            if (result.IsCompleted)
            {
                var data = ReadDataFromStream(stream);
                StartReadingFromStreamAsync(stream);

                if (data.Length > 0)
                    m_ResponsesProcessingQueue.TryEnqueue(data);
            }
        }
    }
    ...
}

And the previously mentioned method:

private byte[] ReadDataFromStream(INetworkStreamAdapter stream)
{
    if (!stream.DataAvailable) return Array.Empty<byte>();

    using (var memoryStream = new MemoryStream())
    {
        var bytesRead = 0;
        var buffer = new byte[GetReadBufferSize()];

        while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
        {
            memoryStream.Write(buffer, 0, bytesRead);
            if (!stream.DataAvailable) break;
        }

        return memoryStream.ToArray();
    }
}

This all is really old code and it can totally be that we are doing something fundamentally wrong here. I found another issue that seems a bit similar to this where @stephentoub mentioned something about checking IAsyncResult.CompletedSynchronously in the callback (https://github.com/dotnet/runtime/issues/29024#issuecomment-474841380). Could this be the same issue?

weltkante commented 10 months ago

Could this be the same issue?

ye, similar bug in the calling code, both yours and the linked code seem to unconditionally queue another read

My guess is the clean shutdown lets the read return zero for a clean shutdown, and the caller unconditionally queues another read, which again completes synchronously with another zero still indicating clean shutdown, recursing until stack overflows. Desktop framework probably was "less optimized" and didn't synchronously complete these requests so you instead might have had a cpu-saturating loop of begin/end callbacks on the thread pool until eventually your queue recognized the connection is terminated. That still was a bug in the calling code but harder to notice since temporary cpu saturation on the threadpool doesn't trigger any diagnostic alarms and the queue can hold many empty reads before going out of memory.

Proper handling would be mean to treat zero byte reads (indicating clean shutdown) to not queue new reads in the callback - they never can receive more data anyways. The recommendation in the linked issue is probably not fixing the problem, just moving it around, but it would fix the stack overflow and trade it for different, easier to diagnose issues like infinite looping if you still get the shutdown handling wrong in the caller.

PS: just randomly came across this issue, read through the code and the problem seemed obvious, sorry if I got it wrong

ManickaP commented 10 months ago

Yes, this is #69269, you can check this comment https://github.com/dotnet/runtime/issues/69269#issuecomment-1125393073 for explanation.

You cannot issue another BeginRead from your callback thread if the result was completed synchronously, see https://learn.microsoft.com/en-us/dotnet/api/system.iasyncresult.completedsynchronously?view=net-8.0#system-iasyncresult-completedsynchronously

From what I understand, in StartReadingFromStreamAsync you're expecting to kick off the reading process, let it fill m_ResponsesProcessingQueue asynchronously and not wait for it. I'd suggest redoing your method to async version (ReadAsync), it will make the code much simpler and less error prone.

I hope this helped, let me know if you have more questions. I'll close this for now, as the question is answered.

ManickaP commented 10 months ago

Dupe of https://github.com/dotnet/runtime/issues/69269

daveMueller commented 10 months ago

OK thanks for your support 🙏 . We fixed this now by escaping the callback when the result completes synchronously. This seems to work stable for us and maybe it helps someone else.

private void DataReceivedInStream(IAsyncResult result)
{
    if (!IsOpen)
        return;

    if (result.CompletedSynchronously)
        return;

    var stream = result.AsyncState as INetworkStreamAdapter;

    try ...