Closed dfdx closed 4 years ago
No, writer is not there yet.
bump.
Is it possiblr to wrire a parquet file by using the write methods in Thrift.jl? It can read thrift but not sure if it wrote thrift meta. Any tips will be helpful.
Yes, mostly. The metadata in a parquet file are thrift structures. So they all can be written using Thrift.jl.
The column data may be encoded/compressed in different ways, so that is comething that the writer has to so using other methods/packages. The writer may take inputs regarding what scheme to use for these, or it may do some of them automatically based on the data.
The writer may need further inputs about how to partition the data into row groups and column chunks.
Are you able to give me an example of how to write FileMetaData for simple DataFrame?
There are multiple structs as you can see in the spec here: https://parquet.apache.org/documentation/latest/. They will have corresponding Julia Thrift structs generated here: https://github.com/JuliaIO/Parquet.jl/tree/master/src/PAR2.
Thrift structs can simply be written using the write
method. You can find some examples of that in Thrift.jl package, e.g.: https://github.com/tanmaykm/Thrift.jl/blob/master/test/memtransport_tests.jl.
There is finally a working Parquet writer! See
https://github.com/xiaodaigh/Diban.jl
I will start work on a PR to Parquet.jl but if you can't wait, please help test out Diban.jl
see #66 for the PR
Closed by #66
In this package, is there any API to write Parquet files?