Closed mahiki closed 3 years ago
As opposed to the python implementation, the Arrow.jl julia package doesn't require arrow-specific integration with other data formats. So to read a parquet file, you can use the Parquet.jl package (which should support partitioned datasets like this), like tbl = read_parquet(filename)
. You can then convert the parquet data to the arrow format by doing Arrow.write("data.arrow", tbl)
. This data can then be read back in vai tbl2 = Arrow.Table("data.arrow")
.
We can perhaps look into providing a way to convert non-arrow data tables directly to Arrow.Table
, but as I mentioned, the value isn't as great as other language implementations where conversions have to be done one-by-one.
Thanks, that could be helpful.
Unfortunately Parquet.jl is failing to recognize partitions and it doesn't support Date data type. (Issues are open for both).
I was looking to Arrow.jl as a potential workaround, but it seems like this is not possible, from what you say.
Correct; Arrow.jl is for arrow data, not parquet. If you open an issue at the Parquet.jl package, @tanmaykm has been very responsive in the past for fixing things.
Sorry, I was away for a while and have missed the issues on Parquet.jl. Thanks the ping @quinnj !
As opposed to the python implementation, the Arrow.jl julia package doesn't require arrow-specific integration with other data formats
Same for R. But I think Julia, rightly have a more modular mentality. Since packages in Julia are more composable there is not a need for a "big bang" approach to have all the functionalities stuffed into one big package.
Sorry if this is a ridiculous question, I am very noob and not good at reading the API.
My use-case is reading partitioned parquet files, I know this is supported from the Apache PyArrow docs with something like:
I tried the very naive: