JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

filter does not work on loaded df #111

Open mariok90 opened 5 years ago

mariok90 commented 5 years ago

When loading a feather file which has a column of Type Arrow.BitPrimitive I get an Error: DimensionMismatch("column length 17 for column(s) A, and is incompatible with column length 2 for column(s) B")

here is a minimal example:

using Feather
using DataFrames

df = DataFrame(A=rand(100), B=rand(Bool, 100))

Feather.write("test.feather", df)

loaded_df = Feather.read("test.feather")

filter(x-> x[:A] < 0.2, loaded_df)

How can I avoid this behaviour? I can work around this by collecting the BitPrimitive column:

loaded_df[:B] = loaded_df[:B] |> collect

filter(x-> x[:A] < 0.2, loaded_df)

but this is not very satisfying.

ExpandingMan commented 5 years ago

This is definitely a bug, it seems that boolean indexing of BitPrimitive is broken.

I've created an issue in Arrow.jl, I should be able to fix it within the next few days.

In the meantime, an alternative workaround would be to use non-boolean integers for storing the table and then later converting them to booleans.