aaubry / YamlDotNet

YamlDotNet is a .NET library for YAML
MIT License
2.54k stars 473 forks source link

Add async support for all IO operations #430

Open aaubry opened 5 years ago

aaubry commented 5 years ago

Add async overloads of all methods that perform IO. I am opening this ticket because I feel that this should eventually be implemented. If this is a feature that would be valuable to you, please say so in the comments.

samsmithnz commented 4 years ago

I wonder what the performance benefit would actually be here, since in my experience this is already pretty quick!

Would you create a second set of functionality with the Async suffix? Or would this be an upgrade and replace of all non async functionality to be async?

aaubry commented 4 years ago

The intention is more to be able to support workflows that only allow async operations. For example, the aspnet request or response streams, which by default require async. I don't have a clear design in mind yet, but I think that the synchronous API would still be necessary, therefore there would need to be a second set of APIs with the async suffix. Forcing async usage in all cases would have a performance cost for all memory-only use cases, which I think are relatively common. What are your thoughts on this ?

samsmithnz commented 4 years ago

I like your plan, it makes sense. If you want to identify particular functions that you'd like to convert, I'm happy to contribute and work on this. I have some experience writing async functions...

aaubry commented 4 years ago

Sure. At the moment I can't dedicate too much time to this feature, so any help is welcome!

samsmithnz commented 4 years ago

I've created an initial commit and would value your input. Would you mind having a peak to make sure it meets your standards before I continue?

Note that this is only for "serialize" so far, and I've only written one async test (I tried to write an alternative for a (relatively) long running unit test, which you can see, appears to have a significant performance boost):

image

Pull Request: #457 (build currently failing due to some performance test issue I will also investigate)

willson556 commented 4 years ago

I took a look at picking up #457 -- I agree with the comment there that async should start at the Emitter and propagate up the stack. I think due to the number of callsites to Write() (which would need to be async and then the callsites would themselves need to be within async methods) that doing this while retaining the synchronous API would result in a lot of code duplication. I think there are 3 possible paths:

  1. Drop the synchronous API
  2. Make a thin-synchronous wrapper around the otherwise async implementation (generally frowned on)
  3. Invent some new architecture that somehow reduces how far async would propagate (I don't have any ideas here)

Personally, those paths are in order of preference and it's likely I could devote some significant development time towards 1 or 2.

aaubry commented 4 years ago

Regarding the options that you suggest, I think that only the first one is viable. Option 2 would add very little value, I think, and if there was an easy way to implement option 3, it would probably already be built into the C# language.

Not keeping the synchronous API is a problem because we target platforms that do not support async. I'm reluctant to discontinue their support and would rather not have async APIs if this proved too difficult to maintain.

I have another concern which is that the emitter currently performs a lot of small writes to the underlying TextWriter. Most of them will probably end up just writing to a buffer and return synchronously. This means that we'll end up having a lot of overhead and maybe the end result will be worse than using the synchronous APIs... We would need to perform some tests before fully committing to using async. I believe that using System.IO.Pipelines could help here but would require a complete reimplementation of the parser and emitter.

Do you have ideas to address these issues ?

willson556 commented 4 years ago

I agree with everything stated above. I started to look at how other serializers had handled this. It looks like Newtonsoft.Json hasn't fully implemented async support -- they seem to have support in the parser but not in the serializer. I then took a look at System.Text.Json and they took an interesting approach to leave async out of most of the codebase: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Write.Stream.cs#L76

They seem to be calling the normal synchronous code and then just copying the output to the stream asynchronously. This sort of approach would seem to be a lot easier to implement in YamlDotNet but I'm not sure how much it really improves performance? I may try and throw together a test of that approach in the next couple of days to see.

willson556 commented 4 years ago

Regardless of performance with more conventional streams, the JsonSerializer approach would meet the stated goal of supporting async-only streams.

willson556 commented 4 years ago

Just came across https://devblogs.microsoft.com/dotnet/system-io-pipelines-high-performance-io-in-net/ which seems highly applicable here (and I now realize you pointed out in your last comment).

aaubry commented 4 years ago

Yes, this would probably be the way forward, although we would have issues with some of the targetted platforms. I don't think that it is very important to keep supporting .NET 2.0, but Unity uses .NET 3.5 and I believe that there are many users of the library on that platform.

willson556 commented 4 years ago

I think with that approach it might be possible to avoid excessive code duplication and use conditional compiles and package references to maintain compatibility. I haven't yet had time to prototype anything concrete to see if it pans out.

lesair commented 4 years ago

Antoine, do you need help on this?

aaubry commented 4 years ago

I'm not currently working on this, but maybe @willson556 does. I'm currently working on a different feature, schemas, and can't spend time on this feature at the moment.

willson556 commented 4 years ago

I haven't started work on this either. I was sort of waiting to see the outcome of #482 and #486 before proceeding with further contributions though I do have a long-run interest in building this.

It is my current belief based on some work we've done since April that it would not be too difficult to do this using the producer/consumer pattern utilizing System.IO.Pipelines or System.Threading.Channels to bridge the gap between calls to Write() in the emitter and async IO. I haven't looked at the read side but I imagine a similar pattern would be possible there as well.

lesair commented 4 years ago

Sounds like fun. How can I help?

goncalo-oliveira commented 1 year ago

Any news on this topic since 2020? Some .NET parts do block synchronous IO operations, so having an async API here would definitely make things smoother.

goldsam commented 10 months ago

+1 to a high-speed implementation based on System.IO.Pipelines.