dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

Potential (~5% chance) AccessViolationException in ReadyToRun executable targeting RID osx.11.0-arm64/osx.12-arm64 when running a child process for the first time, and both standard output and standard error are redirected #88288

Closed hach-que closed 1 month ago

hach-que commented 1 year ago

Description

Redirecting standard output and standard error of child processes for a ReadyToRun executable on macOS M1 can result in:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Net.Sockets.SafeSocketHandle.SetHandleAndValid(IntPtr)
   at Microsoft.Win32.SafeHandles.SafePipeHandle.CreatePipeSocket(Boolean)
   at System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82, System.IO.Pipes, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadAsyncCore>d__82 ByRef)
   at System.IO.Pipes.PipeStream.ReadAsyncCore(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream.ReadAsync(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadBufferAsync>d__16 ByRef)
   at System.Diagnostics.AsyncStreamReader.ReadBufferAsync()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()

Some important notes about this bug:

I managed to reproduce this issue on the following system:

I could also reproduce it on a second system with identical OS version and hardware configuration (building the binary again, rather than copying the built binary), so it is not specific to a single machine or environment.

Reproduction Steps

Create Program.cs with this content:

using System;
using System.Diagnostics;

var cts = new CancellationTokenSource();
Console.CancelKeyPress += (_, _) =>
{
    cts.Cancel();
};
var cancellationToken = cts.Token;

{
    if (Directory.Exists("/tmp/git-test"))
    {
        Directory.Delete("/tmp/git-test", true);
    }
    Directory.CreateDirectory("/tmp/git-test");
    var startInfo = new ProcessStartInfo
    {
        FileName = "/usr/bin/git",
        UseShellExecute = false,
        CreateNoWindow = false,
    };
    startInfo.RedirectStandardInput = false;
    startInfo.RedirectStandardOutput = true;
    startInfo.RedirectStandardError = true;
    startInfo.ArgumentList.Add("init");
    startInfo.ArgumentList.Add("/tmp/git-test");
    var process = Process.Start(startInfo)!;
    process.OutputDataReceived += (sender, e) =>
    {
        var line = e?.Data?.TrimEnd();
        if (!string.IsNullOrWhiteSpace(line))
        {
            Console.WriteLine(line);
        }
    };
    process.BeginOutputReadLine();
    process.ErrorDataReceived += (sender, e) =>
    {
        var line = e?.Data?.TrimEnd();
        if (!string.IsNullOrWhiteSpace(line))
        {
            Console.WriteLine(line);
        }
    };
    process.BeginErrorReadLine();
    try
    {
        // Use our own semaphore and the Exited event
        // instead of Process.WaitForExitAsync, since that
        // function seems to be buggy and can stall.
        var exitSemaphore = new SemaphoreSlim(0);
        process.Exited += (sender, args) =>
        {
            exitSemaphore.Release();
        };
        process.EnableRaisingEvents = true;
        if (process.HasExited)
        {
            exitSemaphore.Release();
        }

        // Wait for the process to exit or until cancellation.
        await exitSemaphore.WaitAsync(cancellationToken);
    }
    finally
    {
        if (cancellationToken.IsCancellationRequested)
        {
            if (!process.HasExited)
            {
                process.Kill(true);
            }
        }
    }
    if (!process.HasExited)
    {
        // Give the process one last chance to exit normally
        // so we can try to get the exit code.
        process.WaitForExit(1000);
        if (!process.HasExited)
        {
            // We can't get the return code for this process.
            return int.MaxValue;
        }
    }
    Console.WriteLine($"git init exited with {process.ExitCode}");
}

Console.WriteLine("testing complete.");
return 0;

Create the procrepo.csproj project with this content:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <PublishSingleFile>true</PublishSingleFile>
    <SelfContained>true</SelfContained>
    <RuntimeIdentifiers>osx.11.0-arm64</RuntimeIdentifiers>
    <IncludeNativeLibrariesForSelfExtract>true</IncludeNativeLibrariesForSelfExtract>
    <PublishReadyToRun>true</PublishReadyToRun>
    <PublishTrimmed>true</PublishTrimmed>
    <EnableCompressionInSingleFile>true</EnableCompressionInSingleFile>
    <DebuggerSupport>false</DebuggerSupport>
    <TrimmerRemoveSymbols>true</TrimmerRemoveSymbols>
    <EnableUnsafeBinaryFormatterSerialization>false</EnableUnsafeBinaryFormatterSerialization>
    <EnableUnsafeUTF7Encoding>false</EnableUnsafeUTF7Encoding>
    <EventSourceSupport>false</EventSourceSupport>
    <HttpActivityPropagationSupport>false</HttpActivityPropagationSupport>
    <InvariantGlobalization>true</InvariantGlobalization>
    <MetadataUpdaterSupport>false</MetadataUpdaterSupport>
    <ShowLinkerSizeComparison>true</ShowLinkerSizeComparison>
  </PropertyGroup>

</Project>

Build the project with:

dotnet publish -c Release -r osx.11.0-arm64

Run the process in a loop with to reproduce the crash:

while true; do ./bin/Release/net7.0/osx.11.0-arm64/publish/procrepo ; done

Reproduction rate

When I ran the program with this Bash command:

SUCCESS=0
FAILURE=0
for ((i=1;i<=100;i++)); do ./bin/Release/net7.0/osx.11.0-arm64/publish/procrepo; if [ $? -eq 0 ]; then SUCCESS=$[$SUCCESS+1]; else FAILURE=$[$FAILURE+1]; fi; done

the results were that this crash happens 5% of the time.

Expected behavior

The .NET process should not crash with AccessViolationException.

Actual behavior

A crash with a callstack that looks similar to one of the following. It's not consistent, and I have seen callstacks that differ from the ones below, but these are the most common:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Net.Sockets.SafeSocketHandle.SetHandleAndValid(IntPtr)
   at Microsoft.Win32.SafeHandles.SafePipeHandle.CreatePipeSocket(Boolean)
   at System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82, System.IO.Pipes, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadAsyncCore>d__82 ByRef)
   at System.IO.Pipes.PipeStream.ReadAsyncCore(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream.ReadAsync(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadBufferAsync>d__16 ByRef)
   at System.Diagnostics.AsyncStreamReader.ReadBufferAsync()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1[[System.Int32, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].get_Task()
   at System.IO.Pipes.PipeStream.ReadAsyncCore(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream.ReadAsync(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadBufferAsync>d__16 ByRef)
   at System.Diagnostics.AsyncStreamReader.ReadBufferAsync()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Net.Sockets.SocketAsyncEventArgs.SetBuffer(System.Memory`1<Byte>)
   at System.Net.Sockets.Socket.ReceiveAsync(System.Memory`1<Byte>, System.Net.Sockets.SocketFlags, Boolean, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.IO.Pipes.PipeStream+<ReadAsyncCore>d__82, System.IO.Pipes, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadAsyncCore>d__82 ByRef)
   at System.IO.Pipes.PipeStream.ReadAsyncCore(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.IO.Pipes.PipeStream.ReadAsync(System.Memory`1<Byte>, System.Threading.CancellationToken)
   at System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Diagnostics.AsyncStreamReader+<ReadBufferAsync>d__16, System.Diagnostics.Process, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]](<ReadBufferAsync>d__16 ByRef)
   at System.Diagnostics.AsyncStreamReader.ReadBufferAsync()
   at System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()

Regression?

No response

Known Workarounds

Turn off PublishReadyToRun in the project file when targeting macOS.

Configuration

No response

Other information

No response

hach-que commented 1 year ago

I have also been able to confirm that this bug is still present in .NET 8 Preview 5.

wfurt commented 5 months ago

Do you have any lead @janvorli ? It seems like https://github.com/dotnet/runtime/issues/102313 is similar/same. I can possibly run binary search on commits if I get local repro. let me know.

wfurt commented 4 months ago

It seems like https://github.com/dotnet/runtime/issues/102313 was fixed in preview 5. Can you please give it shot @hach-que ?

dotnet-policy-service[bot] commented 2 months ago

This issue has been marked needs-author-action and may be missing some important information.

dotnet-policy-service[bot] commented 2 months ago

This issue has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

dotnet-policy-service[bot] commented 1 month ago

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.