apache / arrow-julia

Official Julia implementation of Apache Arrow
https://arrow.apache.org/julia/
Other
283 stars 60 forks source link

copy does not copy to standard Julia Types #495

Open schlichtanders opened 5 months ago

schlichtanders commented 5 months ago

While the documentation says that a copy would ensure to have normal Julia Types

df = copy(DataFrame(Arrow.Table(file))): Build a DataFrame, where the columns are regular in-memory vectors (specifically, Base.Vectors and/or PooledVectors). This requires that you have enough memory to load the entire DataFrame into memory.

this is not the case

image

Moelf commented 5 months ago

maybe that's because result_old[!, "workers"] is not a ::DataFrame?

schlichtanders commented 5 months ago

Interesting, could be, but then, why are only DataFrames supported for copy and not regular Arrays?

Moelf commented 5 months ago

Because you extracted just a column, which is not a data frame, so copy specialized for data frame doesn't work

ericphanson commented 5 months ago

I think the other answer is because (if I understand correctly) it's a DataFrames.jl feature, not an Arrow.jl feature, it's just documented here because it's a common ask

ericphanson commented 5 months ago

Maybe collect is what's desired here, for materializing a column into a Vector? Naively collect should be "iterate this collection into an Array", though I haven't tried it in this case