aloneguid / parquet-dotnet

Fully managed Apache Parquet implementation
https://aloneguid.github.io/parquet-dotnet/
MIT License
542 stars 141 forks source link

ParquetSerializer.SerializeAsync thread safe?? #427

Closed pantonis closed 7 months ago

pantonis commented 8 months ago

Issue description

Im want to use ParquetSerializer.SerializeAsync to write hundreds of thousands of files using Parallel.ForEach. Is ParquetSerializer.SerializeAsync thread safe?

aloneguid commented 8 months ago

Yes, it should be thread safe. However, the better approach (and better performance) would be to use asynchronous API.

pantonis commented 8 months ago

Im using ParquetSerializer.SerializeAsync

aloneguid commented 7 months ago

It is then ;)

pantonis commented 7 months ago

Async and parallel are 2 different things :)

aloneguid commented 7 months ago

Only 2? Then you are using 1 thread, which is the same as async :) But seriously, it's thread-safe, so you can mix and match.

pantonis commented 7 months ago

1thread with parallel? Unless serializer is locking

aloneguid commented 7 months ago

Latest release has fixed some occassional locking issues for class serializer so it's perfectly fine now.

pantonis commented 7 months ago

Trying to understand this "Then you are using 1 thread, which is the same as async"

aloneguid commented 7 months ago

Trying to understand this "Then you are using 1 thread, which is the same as async"

Silly joke, doesn't make sense, apologies ;)

HeathHopkins commented 1 month ago

I created a repo with a couple of different strategies for writing to a single parquet file from multiple threads. https://github.com/HeathHopkins/ParquetMultithread