dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.4k stars 10k forks source link

Cannot implement TextOutputFormatter for IAsyncEnumerable<T> #23203

Closed manandre closed 3 years ago

manandre commented 4 years ago

Issue description

I have created a custom implementation of the TextOutputFormatter to stream IAsyncEnumerable<T> outputs in the response body in my custom (TSV) format. But, at runtime the CanWriteType method only receives a List<T> type, what is not expected and prevents any streaming operation :/

To Reproduce

The repro project (https://github.com/manandre/asyncenumerable-outputformatter-issue) is based on a swagger Pet sample, modified to return the list of Pets as an IAsyncEnumerable<Pet> collection. A custom TsvOutputFormatter is added to illustrate the issue (as a debug assert).

public class TsvOutputFormatter : TextOutputFormatter
{
    public TsvOutputFormatter()
    {
        // Add the supported media type.
        SupportedMediaTypes.Add("text/tsv");
        SupportedEncodings.Add(Encoding.UTF8);
    }

    protected override bool CanWriteType(System.Type type)
    {
        var expected = type == typeof(IAsyncEnumerable<Pet>);
        Debug.Assert(expected); // Only this type is used as output format in this project
        return expected;
    }

    public override async Task WriteResponseBodyAsync(OutputFormatterWriteContext context, Encoding selectedEncoding)
    {
        // Never called
    }
}

Further technical details

SDK .NET Core (reflétant tous les global.json) :
 Version:   3.1.202
 Commit:    6ea70c8dca

Environnement d'exécution :
 OS Name:     Windows
 OS Version:  10.0.18363
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\3.1.202\

Host (useful for support):
  Version: 3.1.4
  Commit:  0c2e69caa6

.NET Core SDKs installed:
  2.1.602 [C:\Program Files\dotnet\sdk]
  2.2.401 [C:\Program Files\dotnet\sdk]
  3.0.101 [C:\Program Files\dotnet\sdk]
  3.1.202 [C:\Program Files\dotnet\sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.9 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.All 2.1.18 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.All 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.9 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 2.1.18 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.0.1 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.1.9 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.1.18 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.2.6 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.0.1 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.WindowsDesktop.App 3.0.1 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 3.1.4 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

javiercn commented 4 years ago

@manandre thanks for contacting us.

I believe we don't support streaming the result. We buffer AsyncEnumerables. See here for the associated issue.

I believe the issue here is that STJ would need to handle the IAsyncEnumerable. Am I wrong here?

/cc: @pranavkm

javiercn commented 4 years ago

@manandre In your case, I think it is best if you create your own action result to handle streaming the response.

I've dug a bit more into this and I've confirmed that the issue is that we don't have any knowledge about whether a formatter supports IAsyncEnumerable or not, so we are forced to buffer the results ahead of time to make sure it works with all formatters.

I'm not sure if STJ will ever support this, but it will be very unlikely XML will support this at all. If STJ decides to support something like this, we might consider improving the support here, by adding a check/config that allows indicating a given formatter "supports" streaming and in that case avoid the buffering. We might need to adapt some things here, like the content length and so on too, since we won't know by the time we start writing the response.

Buffering the asyncenumerable would be the default behavior unless we detect that the formatter we selected supports streaming. It is unlikely that we build something like this for 5.0, so I'm moving this issue to the backlog for future consideration if we hear more feedback about it.

manandre commented 4 years ago

@javiercn Thanks for your quick, clear and complete answer about my issue. Meanwhile, could we imagine a setting (e.g. MvcOption) to disable the buffering of IAsyncEnumerable (e.g. SuppressAsyncEnumerableBuffering) ? When disabled, it would then be up to the user to provide IAsyncEnumerable-compatible formatters for his supported serialization format(s).

pranavkm commented 4 years ago

@manandre our initial plan was based on @javiercn's suggestion here:

we might consider improving the support here, by adding a check/config that allows indicating a given formatter "supports" streaming and in that case avoid the buffering.

https://github.com/dotnet/runtime/issues/1570 is still scheduled for 5.0, so we might add support for this before this release. We'd also be open to a PR if you'd like to send one.

public interface ISupportAsyncEnumerableOutputFormatter {}

which ObjectResultExecutor checks for prior to buffering.

logiclrd commented 4 years ago

In response to https://github.com/dotnet/runtime/issues/1570, I made API proposal https://github.com/dotnet/runtime/issues/38055. This proposes making System.Text.Json support IAsyncEnumerable directly, which would in turn permit ASP.NET Core to remove its buffering of the type and simply pass these enumerables to the underlying serializer.

There is in fact an implementation of this proposal in a PR but I committed a gaffe because I didn't know about the API review process and submitted my PR with no review. If the review of the proposal goes favourably, then the implementation should itself be immediately ready for review. Crossing fingers :-)

pranavkm commented 4 years ago

which would in turn permit ASP.NET Core to remove its buffering of the type and simply pass these enumerables to the underlying serializer.

This would specifically benefit S.T.J. MVC will continue buffering for all other formatters, which is why MVC needs a gesture that indicates when a formatter suppports serializing IAsyncEnumerable.

logiclrd commented 4 years ago

Hmm, I wasn't aware that there was non-System.Text.Json buffering going on. The buffering that I knew about is very System.Text.Json-specific:

https://github.com/dotnet/aspnetcore/blob/master/src/Mvc/Mvc.Core/src/Infrastructure/SystemTextJsonResultExecutor.cs#L70-L75

logiclrd commented 4 years ago

Ahh, I see there's this too:

https://github.com/dotnet/aspnetcore/blob/master/src/Mvc/Mvc.Core/src/Infrastructure/ObjectResultExecutor.cs#L121-L124

Hmm

logiclrd commented 4 years ago

Question, what is the reason that ASP.NET Core handles IAsyncEnumerable<T> by buffering rather than adapting? Couldn't it just make an IEnumerable<T> whose implementation simply called an IAsyncEnumerable<T>'s corresponding methods and returned the .Result? Obviously it wouldn't be async at that point, but to my mind, this has two significant benefits over the buffering approach: data is pumped as it is produced, and only one element needs to be dealt with/in memory at once. Large result sets with the current IAsyncEnumerable<T> mean large memory usage while the response is being serialized.

There's also a third advantage: If a route returns an IAsyncEnumerable<T> that does not end, with the intention that results be streamed to the client for as long as the client chooses to stay connected, then with buffering, no results are ever sent and memory usage grows indefinitely because it's trying to find the end of an enumeration that does not end. There's a separate issue, where the response writer doesn't return errors and the serialization ignores the cancellation token on the response, so if you do this, then when the client disconnects, the data pump doesn't actually notice that it has disconnected and keeps pulling data from the enumerable and writing it to oblivion, but that's its own bug. :-P

pranavkm commented 4 years ago

The IAsyncEnumerable support in MVC specifically exists to avoid sync-over-async. We did not set out to implement full featured support for this since we expect some formatters to eventually support it.

no results are ever sent and memory usage grows indefinitely because it's trying to find the end of an enumeration that does not end.

There is an upper bound to the number of buffered elements: https://docs.microsoft.com/en-us/dotnet/api/microsoft.aspnetcore.mvc.mvcoptions.maxiasyncenumerablebufferlimit?view=aspnetcore-3.1#Microsoft_AspNetCore_Mvc_MvcOptions_MaxIAsyncEnumerableBufferLimit

logiclrd commented 4 years ago

Hmm, but by explicitly buffering the contents, isn't it even more sync-over-async than if it did a micro sync-over-async per element?

logiclrd commented 4 years ago

I have become aware of the following scenario:

Thus, a deadlock arises.

For the benefit of others reading this, so that they understand the context more fully:

When ASP.NET Core is buffering the IAsyncEnumerable, it is doing it in an async context, so the top-level thread doing the operation isn't at any point blocked.

System.Text.Json is async end-to-end, so after it has finished buffering the IAsyncEnumerable, it is then doing an async operation to serialize it. This means that it should be very possible for System.Text.Json to properly support IAsyncEnumerable without sync-over-async. But, if you are using a different formatter, such as XmlSerializer or Newtonsoft.Json, then the actual serialization is a synchronous call, and once it goes down that route, the above deadlock can arise if it ever makes an async call. Thus, it buffers the IAsyncEnumerable using async code first, e.g.:

https://github.com/dotnet/aspnetcore/blob/master/src/Mvc/Mvc.NewtonsoftJson/src/NewtonsoftJsonResultExecutor.cs#L135-L141

@pranavkm I have done further reading which brings me greater understanding of your earlier comments. Specifically, I see the distinction between result executors and output formatters, and notice that the result executors -- all of them, independently -- are where the IAsyncEnumerable buffering is taking place. And, as you said, ObjectResultExecutor, which passes the result off to an output formatter, could omit its buffering if it knew that the formatter could handle IAsyncEnumerable.

This change, however, would only make sense after System.Text.Json supports IAsyncEnumerable natively upstream.

pranavkm commented 3 years ago

@manandre starting in 6.0-preview4, MVC makes supporting up to individual formatters to handle IAsyncEnumerable a concern of the formatter. ObjectResultExecutor no longer buffers by default, and you will receive the passed in IAsyncEnumerable<> instance if you register a custom formatter.

manandre commented 3 years ago

@pranavkm Thanks for such a quick implementation on aspnetcore side. I will be able to replace my horrible workarounds in such a perf-critical place 👍