dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.46k stars 4.76k forks source link

Potentially missing await of requestBodyTask in Http2Connection #107082

Open arekpalinski opened 3 months ago

arekpalinski commented 3 months ago

Description

We have a customized implementation of HttpStream which under the covers reads data from unmanaged memory and writes it as JSON to the request body:

https://github.com/ravendb/ravendb/blob/v5.4/src/Raven.Client/Json/BlittableJsonContent.cs

More specifically we have SingleNodeBatchCommand which uses BlittableJsonContent:

https://github.com/ravendb/ravendb/blob/801a228dbc33eeaee83e288e733d50bdad18f3a5/src/Raven.Client/Documents/Commands/Batches/BatchCommand.cs#L96-L114

This command is used in Raven ETL process to ELT data from one RavenDB database to another. It's used by Raven ETL process which a single, dedicated thread:

https://github.com/ravendb/ravendb/blob/801a228dbc33eeaee83e288e733d50bdad18f3a5/src/Raven.Server/Documents/ETL/Providers/Raven/RavenEtl.cs#L181

The problem is that we're getting randomly crashes, AVE or exceptions when ETL process is stopping while the HTTP request is being processed.

When reproducing it locally we're mostly getting NRE exception due to some internals being disposed (which deals with unmanaged memory hence often resulting in AVE). The stacktrace is always the same:

System.NullReferenceException: 'Object reference not set to an instance of an object.'

    Sparrow.dll!Sparrow.Compression.VariableSizeEncoding.Read<int>(byte* input, out int offset, out bool success) Line 181  C#
    Sparrow.dll!Sparrow.Compression.VariableSizeEncoding.Read<int>(byte* input, out int offset) Line 256    C#
    Sparrow.dll!Sparrow.Json.BlittableJsonReaderBase.ReadVariableSizeInt(int pos, out int offset) Line 179  C#
    Sparrow.dll!Sparrow.Json.BlittableJsonReaderObject.BlittableJsonReaderObject(int pos, Sparrow.Json.BlittableJsonReaderObject parent, Sparrow.Json.BlittableJsonToken type) Line 163 C#
    Sparrow.dll!Sparrow.Json.BlittableJsonReaderObject.GetObject(Sparrow.Json.BlittableJsonToken type, int position, out bool isBlittableJsonReader) Line 967   C#
    Sparrow.dll!Sparrow.Json.BlittableJsonReaderObject.GetPropertyByIndex(int index, ref Sparrow.Json.BlittableJsonReaderObject.PropertyDetails prop, bool addObjectToCache) Line 775   C#
    Sparrow.dll!Sparrow.Json.AbstractBlittableJsonTextWriter.WriteObject(Sparrow.Json.BlittableJsonReaderObject obj) Line 116   C#
    Sparrow.dll!Sparrow.Json.AbstractBlittableJsonTextWriter.WriteValue(Sparrow.Json.BlittableJsonToken token, object val) Line 166 C#
    Sparrow.dll!Sparrow.Json.AbstractBlittableJsonTextWriter.WriteObject(Sparrow.Json.BlittableJsonReaderObject obj) Line 119   C#
    Sparrow.dll!Sparrow.Json.BlittableJsonTextWriterExtensions.WriteArray(Sparrow.Json.AbstractBlittableJsonTextWriter writer, string name, System.Collections.Generic.IEnumerable<Sparrow.Json.BlittableJsonReaderObject> items) Line 119  C#
    Raven.Client.dll!Raven.Client.Documents.Commands.Batches.SingleNodeBatchCommand.CreateRequest.AnonymousMethod__0(System.IO.Stream stream) Line 104  C#
    Raven.Client.dll!Raven.Client.Json.BlittableJsonContent.SerializeToStreamAsync(System.IO.Stream stream, System.Net.TransportContext context) Line 33    C#
    [Resuming Async Method] 
    System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)   Unknown
    System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<Raven.Client.Json.BlittableJsonContent.<SerializeToStreamAsync>d__2>.MoveNext(System.Threading.Thread threadPoolThread)   Unknown
    System.Private.CoreLib.dll!System.Runtime.CompilerServices.TaskAwaiter.OutputWaitEtwEvents.AnonymousMethod__12_0(System.Action innerContinuation, System.Threading.Tasks.Task innerTask)    Unknown
    System.Private.CoreLib.dll!System.Threading.Tasks.AwaitTaskContinuation.System.Threading.IThreadPoolWorkItem.Execute()  Unknown
    System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()  Unknown
    System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() Unknown
    [Async Call Stack]  
    [Async] System.Net.Http.dll!System.Net.Http.Http2Connection.Http2Stream.SendRequestBodyAsync(System.Threading.CancellationToken cancellationToken)  Unknown
    [Async] System.Net.Http.dll!System.Net.Http.HttpConnectionBase.LogExceptions.AnonymousMethod__23_0(System.Threading.Tasks.Task t, object state) Unknown

We have confirmed that while this is still being processed the ETL thread is already being stopped. So it didn't wait for the request to be completed / aborted.

Based on the above stacktrace it looks here is the relevant piece of code:

https://github.com/dotnet/runtime/blob/18b2ef18fd58146cc1c50acd8bfcd85a3a663255/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs#L2039-L2047

More specifically HttpConnectionBase.LogExceptions() has comment: https://github.com/dotnet/runtime/blob/18b2ef18fd58146cc1c50acd8bfcd85a3a663255/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionBase.cs#L189

but it doesn't look that the task is awaited:

https://github.com/dotnet/runtime/blob/18b2ef18fd58146cc1c50acd8bfcd85a3a663255/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionBase.cs#L199-L203

Here it what we saw when we got the exception with debugger attached:

image

Is it intentional that requestBodyTask task isn't awaited in that scenario?

Reproduction Steps

We managed to reproduce it locally with the usage of two RavenDB servers and ETL between them. It was reproduced only on HTTP2.

Expected behavior

The requestBodyTask should be awaited.

Actual behavior

Not awaited task causes the continuation of the main thread resulting in crashes.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

dotnet-policy-service[bot] commented 3 months ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

MihaZupan commented 3 months ago

Is it intentional that requestBodyTask task isn't awaited in that scenario?

Yes, there are cases where the task returned by HttpClient.SendAsync may be completed before the call to the request content's SerializeToStreamAsync completes.

For example with HTTP/2, this is done to allow for duplex communication. See the comment right above the LogExceptions(requestBodyTask); line you highlighted:

We received the response headers but the request body hasn't yet finished; this most commonly happens when the protocol is being used to enable duplex communication

This means that you can read the response content stream while still sending the request content.

As another example, with HTTP/1.1, this may happen if you're using Expect: 100-continue and the request fails as you are sending content. We won't wait for the task to complete before returning, but we will call LogExceptions to ensure we aren't leaking unobserved task exceptions.


You may have to introduce synchronization to ensure you've finished sending the content. For example that could mean adding logic along the lines of

internal class BlittableJsonContent : HttpContent
{
+   private static readonly TaskCompletionSource s_sentinel = new();
+   private TaskCompletionSource? _tcs;

+   public Task EnsureCompletedAsync() =>
+       Interlocked.CompareExchange(ref _tcs, s_sentinel, null) is null
+           ? Task.CompletedTask // SerializeToStreamAsync was never called
+           : _tcs.Task;

    protected override async Task SerializeToStreamAsync(Stream stream, TransportContext context)
    {
+       if (Interlocked.CompareExchange(ref _tcs, new(TaskCreationOptions.RunContinuationsAsynchronously), null) is not null)
+           throw new Exception("Already called previously, or called after EnsureCompletedAsync");

+       try
+       {
            await stream.FlushAsync().ConfigureAwait(false);

            await using (var gzipStream = new GZipStream(stream, CompressionMode.Compress, leaveOpen: true))
            {
                await _asyncTaskWriter(gzipStream).ConfigureAwait(false);
            }

+           _tcs.SetResult();
+       }
+       catch (Exception ex)
+       {
+           _tcs.SetException(ex);
+           throw;
+       }
    }

that you can then await before disposing resources needed by the request content.

arekpalinski commented 3 months ago

Thanks @MihaZupan. Can you elaborate a bit more on the following please?

Yes, there are cases where the task returned by HttpClient.SendAsync may be completed before the call to the request content's SerializeToStreamAsync completes.

Can we assume that those cases are possible only if a request is erorred / aborted / cancelled? Are we guaranteed that if HttpClient.SendAsync() completes successfully then we're good?

In other words, can we do the following:

try
{
    return await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead, token).ConfigureAwait(false);
}
catch
{
    if (request.Content is BlittableJsonContent bjc)
    {
        var requestBodyTask = bjc.EnsureCompletedAsync();

        if (requestBodyTask.IsCompleted == false)
            await requestBodyTask.ConfigureAwait(false);
    }

    throw;
}

or should we rather have finally instead of catch?

MihaZupan commented 3 months ago

Can we assume that those cases are possible only if a request is erorred / aborted / cancelled?

No. The HTTP/2 duplex streaming scenario is one such case where the request is successful (SendAsync does not throw), but both the request and response content streams may be used at the same time afterwards.

P.S. depending on whether you care about exceptions thrown inside the SerializeToStreamAsync (likely not), you can just use a finally in EnsureCompletedAsync:

-           _tcs.SetResult();
-       }
-       catch (Exception ex)
-       {
-           _tcs.SetException(ex);
-           throw;
-       }
+       }
+       finally
+       {
+           _tcs.SetResult();
+       }
arekpalinski commented 3 months ago

Thanks @MihaZupan. While this behavior make sense after you explained, it was quite unexpected for us.

It's the second time we acquired some non trivial, internal knowledge about customized HttpContent (we also stumbled upon the following in the past - https://github.com/dotnet/runtime/issues/96223#issuecomment-1865009861, https://github.com/dotnet/runtime/issues/30295#issuecomment-516210081).

Perhaps this knowledge could be mentioned in the official docs (https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpcontent)

MihaZupan commented 2 months ago

Reopening to track improving docs here. There's enough non-obvious issues you may run into with a custom HttpContent implementation that doesn't (know to) account for things like:

arekpalinski commented 2 months ago

@MihaZupan

SerializeToStreamAsync being called multiple times during retries

Does it mean the following code isn't correct because it will be throwing exceptions on retries?

protected override async Task SerializeToStreamAsync(Stream stream, TransportContext context)
 {
       if (Interlocked.CompareExchange(ref _tcs, new(TaskCreationOptions.RunContinuationsAsynchronously), null) is not null)
           throw new Exception("Already called previously, or called after EnsureCompletedAsync");
MihaZupan commented 2 months ago

It won't support retires, yes ("Already called previously").

If being called multiple times is something _asyncTaskWriter can handle in your case, you could tweak the code to account for that. But custom HttpContents often can't support that, for example, because they already consumed some data the first time around, and can't be rewound.

arekpalinski commented 2 months ago

Is it guaranteed that the retries are handled inside HttpClient.SendAsync()? In other words, does it mean that if I await it and this task is completed then we're done with retries?

If that's the case then I can modify the code to always create a new BlittableJsonContent._tcs, so the last one will be effectively EnsureCompletedAsync().

MihaZupan commented 2 months ago

You'd still have to account for the possibility that the content will never be sent. E.g. you could have a List<TCS> and a bool flag to indicate that you're done, or something logically equivalent.

Is it guaranteed that the retries are handled inside

There are a few edge cases where that might not hold (this may change in the future). While retries are handled internally during the SendAsync call, there are currently cases where we may not wait for SerializeToStreamAsync to complete. This also means that while we may call SerializeToStreamAsync as part of SendAsync, SendAsync may already complete before your implementation gets to execute its first line of code.