Open pkese opened 5 months ago
I am also trying to use this library with F# (it doesn't write out correct Parquet files (manual or auto serilisation) no footer afaict).
If you have a moment, ow do you go about it, what would you recommend or avoid?
Or even better a quick gist, just to show me something is possible..
OK, I finally got it working with a manually dispose call. I am not a .Net / F# person. so maybe this is only confusing to me.
use fileStream = File.Create(filePath)
let! parquetWriter = ParquetWriter.CreateAsync(schema, fileStream) |> Async.AwaitTask
parquetWriter.CompressionMethod <- CompressionMethod.Gzip
parquetWriter.CompressionLevel <- System.IO.Compression.CompressionLevel.Optimal
try
use rowGroupWriter = parquetWriter.CreateRowGroup()
for dataColumn in dataColumns do
printfn "debug: %s" dataColumn.Field.Name
do! rowGroupWriter.WriteColumnAsync(dataColumn) |> Async.AwaitTask
printfn "Successfully wrote to Parquet file: %s" filePath
finally
// Explicitly call Dispose if necessary; generally not required with 'use'
parquetWriter.Dispose()
rather than async {...}
you use task {...}
and then your code becomes:
task {
use fileStream = File.Create(filePath)
use! parquetWriter = ParquetWriter.CreateAsync(schema, fileStream)
parquetWriter.CompressionMethod <- CompressionMethod.Gzip
parquetWriter.CompressionLevel <- System.IO.Compression.CompressionLevel.Optimal
use rowGroupWriter = parquetWriter.CreateRowGroup()
for dataColumn in dataColumns do
printfn "debug: %s" dataColumn.Field.Name
do! rowGroupWriter.WriteColumnAsync(dataColumn)
printfn "Successfully wrote to Parquet file: %s" filePath
return ()
}
I myself ended up sticking with json (I wanted to improve my data pipeline with parquet, but then didn't find time for it)
Issue description
Hi,
I'm using F# which has a few extra built-in container types, like immutable records, tuples, linked lists, optionals etc. for which Parquet.Net's class serialization doesn't work at all.
It would be nice if Parquet.Net would be able to support those extra container types.
I'm not expecting Parquet.Net to add support for these extra types - it's probably beyond the scope for this library - however it would be nice if Parquet.Net would be flexible enough as to allow people to implement such things themselves.
It could be done in a similar fashion as how System.Text.Json allows for registering additional domain types and letting people provide their own extensions, e.g. FSharp.SystemTextJson.
So this ticket is a humble request to provide functionality for extending Parquet.Net in a similar (but not necessarily the same) manner as JsonConverterFactory allows for extending System.Text.Json with custom container types.
I'm sure there will be other users of such APIs, not just F# folks.
Thanks.