Open arekpalinski opened 3 months ago
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
Is it intentional that
requestBodyTask
task isn't awaited in that scenario?
Yes, there are cases where the task returned by HttpClient.SendAsync
may be completed before the call to the request content's SerializeToStreamAsync
completes.
For example with HTTP/2, this is done to allow for duplex communication.
See the comment right above the LogExceptions(requestBodyTask);
line you highlighted:
We received the response headers but the request body hasn't yet finished; this most commonly happens when the protocol is being used to enable duplex communication
This means that you can read the response content stream while still sending the request content.
As another example, with HTTP/1.1, this may happen if you're using Expect: 100-continue
and the request fails as you are sending content. We won't wait for the task to complete before returning, but we will call LogExceptions
to ensure we aren't leaking unobserved task exceptions.
You may have to introduce synchronization to ensure you've finished sending the content. For example that could mean adding logic along the lines of
internal class BlittableJsonContent : HttpContent
{
+ private static readonly TaskCompletionSource s_sentinel = new();
+ private TaskCompletionSource? _tcs;
+ public Task EnsureCompletedAsync() =>
+ Interlocked.CompareExchange(ref _tcs, s_sentinel, null) is null
+ ? Task.CompletedTask // SerializeToStreamAsync was never called
+ : _tcs.Task;
protected override async Task SerializeToStreamAsync(Stream stream, TransportContext context)
{
+ if (Interlocked.CompareExchange(ref _tcs, new(TaskCreationOptions.RunContinuationsAsynchronously), null) is not null)
+ throw new Exception("Already called previously, or called after EnsureCompletedAsync");
+ try
+ {
await stream.FlushAsync().ConfigureAwait(false);
await using (var gzipStream = new GZipStream(stream, CompressionMode.Compress, leaveOpen: true))
{
await _asyncTaskWriter(gzipStream).ConfigureAwait(false);
}
+ _tcs.SetResult();
+ }
+ catch (Exception ex)
+ {
+ _tcs.SetException(ex);
+ throw;
+ }
}
that you can then await before disposing resources needed by the request content.
Thanks @MihaZupan. Can you elaborate a bit more on the following please?
Yes, there are cases where the task returned by HttpClient.SendAsync may be completed before the call to the request content's SerializeToStreamAsync completes.
Can we assume that those cases are possible only if a request is erorred / aborted / cancelled? Are we guaranteed that if HttpClient.SendAsync()
completes successfully then we're good?
In other words, can we do the following:
try
{
return await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead, token).ConfigureAwait(false);
}
catch
{
if (request.Content is BlittableJsonContent bjc)
{
var requestBodyTask = bjc.EnsureCompletedAsync();
if (requestBodyTask.IsCompleted == false)
await requestBodyTask.ConfigureAwait(false);
}
throw;
}
or should we rather have finally
instead of catch
?
Can we assume that those cases are possible only if a request is erorred / aborted / cancelled?
No. The HTTP/2 duplex streaming scenario is one such case where the request is successful (SendAsync
does not throw), but both the request and response content streams may be used at the same time afterwards.
P.S. depending on whether you care about exceptions thrown inside the SerializeToStreamAsync
(likely not), you can just use a finally
in EnsureCompletedAsync
:
- _tcs.SetResult();
- }
- catch (Exception ex)
- {
- _tcs.SetException(ex);
- throw;
- }
+ }
+ finally
+ {
+ _tcs.SetResult();
+ }
Thanks @MihaZupan. While this behavior make sense after you explained, it was quite unexpected for us.
It's the second time we acquired some non trivial, internal knowledge about customized HttpContent (we also stumbled upon the following in the past - https://github.com/dotnet/runtime/issues/96223#issuecomment-1865009861, https://github.com/dotnet/runtime/issues/30295#issuecomment-516210081).
Perhaps this knowledge could be mentioned in the official docs (https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpcontent)
Reopening to track improving docs here. There's enough non-obvious issues you may run into with a custom HttpContent
implementation that doesn't (know to) account for things like:
SerializeToStreamAsync
being called multiple times during retriesSerializeToStreamAsync
running beyond the await SendAsync
SerializeToStreamAsync
never being called@MihaZupan
SerializeToStreamAsync being called multiple times during retries
Does it mean the following code isn't correct because it will be throwing exceptions on retries?
protected override async Task SerializeToStreamAsync(Stream stream, TransportContext context)
{
if (Interlocked.CompareExchange(ref _tcs, new(TaskCreationOptions.RunContinuationsAsynchronously), null) is not null)
throw new Exception("Already called previously, or called after EnsureCompletedAsync");
It won't support retires, yes ("Already called previously").
If being called multiple times is something _asyncTaskWriter
can handle in your case, you could tweak the code to account for that.
But custom HttpContent
s often can't support that, for example, because they already consumed some data the first time around, and can't be rewound.
Is it guaranteed that the retries are handled inside HttpClient.SendAsync()
? In other words, does it mean that if I await it and this task is completed then we're done with retries?
If that's the case then I can modify the code to always create a new BlittableJsonContent._tcs
, so the last one will be effectively EnsureCompletedAsync()
.
You'd still have to account for the possibility that the content will never be sent.
E.g. you could have a List<TCS>
and a bool flag to indicate that you're done, or something logically equivalent.
Is it guaranteed that the retries are handled inside
There are a few edge cases where that might not hold (this may change in the future).
While retries are handled internally during the SendAsync
call, there are currently cases where we may not wait for SerializeToStreamAsync
to complete. This also means that while we may call SerializeToStreamAsync
as part of SendAsync
, SendAsync
may already complete before your implementation gets to execute its first line of code.
Description
We have a customized implementation of HttpStream which under the covers reads data from unmanaged memory and writes it as JSON to the request body:
https://github.com/ravendb/ravendb/blob/v5.4/src/Raven.Client/Json/BlittableJsonContent.cs
More specifically we have
SingleNodeBatchCommand
which usesBlittableJsonContent
:https://github.com/ravendb/ravendb/blob/801a228dbc33eeaee83e288e733d50bdad18f3a5/src/Raven.Client/Documents/Commands/Batches/BatchCommand.cs#L96-L114
This command is used in Raven ETL process to ELT data from one RavenDB database to another. It's used by Raven ETL process which a single, dedicated thread:
https://github.com/ravendb/ravendb/blob/801a228dbc33eeaee83e288e733d50bdad18f3a5/src/Raven.Server/Documents/ETL/Providers/Raven/RavenEtl.cs#L181
The problem is that we're getting randomly crashes, AVE or exceptions when ETL process is stopping while the HTTP request is being processed.
When reproducing it locally we're mostly getting NRE exception due to some internals being disposed (which deals with unmanaged memory hence often resulting in AVE). The stacktrace is always the same:
We have confirmed that while this is still being processed the ETL thread is already being stopped. So it didn't wait for the request to be completed / aborted.
Based on the above stacktrace it looks here is the relevant piece of code:
https://github.com/dotnet/runtime/blob/18b2ef18fd58146cc1c50acd8bfcd85a3a663255/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs#L2039-L2047
More specifically
HttpConnectionBase.LogExceptions()
has comment: https://github.com/dotnet/runtime/blob/18b2ef18fd58146cc1c50acd8bfcd85a3a663255/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionBase.cs#L189but it doesn't look that the task is awaited:
https://github.com/dotnet/runtime/blob/18b2ef18fd58146cc1c50acd8bfcd85a3a663255/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionBase.cs#L199-L203
Here it what we saw when we got the exception with debugger attached:
Is it intentional that
requestBodyTask
task isn't awaited in that scenario?Reproduction Steps
We managed to reproduce it locally with the usage of two RavenDB servers and ETL between them. It was reproduced only on HTTP2.
Expected behavior
The
requestBodyTask
should be awaited.Actual behavior
Not awaited task causes the continuation of the main thread resulting in crashes.
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
No response