aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 140 forks source link

[BUG]: Risk for deadlock when ParquetWriter is disposed? #479

Open andagr opened 4 months ago

andagr commented 4 months ago

Library Version

4.23.4

OS

Windows

OS Architecture

64 bit

How to reproduce?

This is not easy to reproduce because it depends on where the library is used, behaviour is different depending on if it's running e.g in a GUI app or ASP.NET. In my case it is used in an Azure Function app, and out of several thousand serializations per day the process hangs a few times.

ParquetWriter.Dispose() contains a synchronous wait on a task without .ConfigureAwait(false). This is known to cause deadlocks and I suspect this is causing our function app to hang. More info on ConfigureAwait: https://devblogs.microsoft.com/dotnet/configureawait-faq/#why-would-i-want-to-use-configureawaitfalse On the other hand it seems Task.Run() is also possible to use for avoiding deadlocks since it puts the work in a background thread: https://devblogs.microsoft.com/dotnet/configureawait-faq/#can-i-use-task-run-to-avoid-using-configureawaitfalse https://stackoverflow.com/a/28307965/114174

Here is when the synchronous wait was added: https://github.com/aloneguid/parquet-dotnet/commit/1d1ced29dca58397846d399c087f38475fc7f433#diff-67e525645019c5bdbfa30dd0862bf24bccea0d8d5bd4e95a37c06975d1d41e5eR145

Either way I'm not sure about this, I will change this to IAsyncDisposable in a local build instead and see if it resolves the hanging process on our end.

Failing test

No response

andagr commented 4 months ago

Here are the changes I've done and will try out: https://github.com/aloneguid/parquet-dotnet/compare/master...andagr:parquet-dotnet:master

aloneguid commented 3 months ago

Good point. It was created when IAsyncDisposable did not exist. If I'm not wrong await using is suppoted by most C# version, so this can be added, unfortunately as a breaking change.

andagr commented 3 months ago

@aloneguid Unfortunately this hasn't resolved our function timeouts completely, so at the very least this is not the only reason for them.

Still, it would be nice to support IAsyncDisposable now that it does exist, would it be ok if I add a PR for this? If yes, how would you prefer this? Some alternatives:

  1. Replace usage of IDisposable with IAsyncDisposable and accept a breaking change/new major version
  2. Keep IDisposable and add IAsyncDisposable support to ParquetWriter, avoid breaking change by: a. Add a new argument to ParquetSerializer.SerializeAsync to be able to choose between IDisposable or IAsyncDisposable, default to IDisposable to keep backwards compatibility b. Add a new method similar to ParquetSerializer.SerializeAsync, but which uses IAsyncDisposable (what should it be called?)