JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

Error with reading string column in a feather file #109

Closed rsrock closed 5 years ago

rsrock commented 5 years ago

I have a feather file that is giving me the following error in a fresh Julia session:

julia> using Feather

julia> Feather.read("test/fourspots.feather")
ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type String
Closest candidates are:
  convert(::Type{String}, ::Union{CategoricalString{R}, CategoricalValue{T,R} where T} where R) at /Users/rrock/.julia/packages/CategoricalArrays/ucKV2/src/value.jl:82
  convert(::Type{String}, ::WeakRefStrings.WeakRefString) at /Users/rrock/.julia/packages/WeakRefStrings/lXDgV/src/WeakRefStrings.jl:72
  convert(::Type{T<:AbstractString}, ::T<:AbstractString) where T<:AbstractString at strings/basic.jl:207
  ...
Stacktrace:
 [1] Feather.Metadata.CTable(::Nothing, ::Int64, ::Array{Feather.Metadata.Column,1}, ::Int32, ::Nothing) at /Users/rrock/.julia/packages/Feather/tppUH/src/metadata.jl:61
 [2] read(::FlatBuffers.Table{Feather.Metadata.CTable}, ::Type{Feather.Metadata.CTable}) at /Users/rrock/.julia/packages/FlatBuffers/YAnlP/src/FlatBuffers.jl:320
 [3] read at /Users/rrock/.julia/packages/FlatBuffers/YAnlP/src/FlatBuffers.jl:300 [inlined]
 [4] read at /Users/rrock/.julia/packages/FlatBuffers/YAnlP/src/FlatBuffers.jl:323 [inlined]
 [5] getctable(::Array{UInt8,1}) at /Users/rrock/.julia/packages/Feather/tppUH/src/loadfile.jl:38
 [6] #Source#4(::Bool, ::Type, ::String) at /Users/rrock/.julia/packages/Feather/tppUH/src/source.jl:18
 [7] Type at ./none:0 [inlined]
 [8] #read#7(::Bool, ::Function, ::String) at /Users/rrock/.julia/packages/Feather/tppUH/src/source.jl:68
 [9] read(::String) at /Users/rrock/.julia/packages/Feather/tppUH/src/source.jl:68
 [10] top-level scope at none:0

The feather file opens without any problems in R. I assume the issue is with a single column containing string UUIDs, because all other columns are of doubles. I don't see anything obvious that's wrong with the file (there are no NA or Nothing entries in the UUID column, for example).

This is with Feather v0.5.1, and Arrow v0.2.3

ExpandingMan commented 5 years ago

I'm pretty sure this is a Flatbuffers issue since it is coming from a call to FlatBuffers.read. Could you make sure that you have the most recent version of FlatBuffers?

@quinnj , any thoughts on the FlatBuffers end of this? It seems that all of the strings getting passed to the CTable constructor are nothing instead.

quinnj commented 5 years ago

@rjkat has done some overhauling recently of FlatBuffers.jl; perhaps he has an idea what's going on.

rjkat commented 5 years ago

This is related to the behaviour of default values for string fields. In 0.5 they changed from "" to nothing. It seems like Feather.jl was relying on this behaviour but there were no explicit tests for it. I've tagged FlatBuffers 0.5.2 which reverts back to the old behaviour, hopefully once the release makes it into METADATA that should address this problem.

ExpandingMan commented 5 years ago

Great thanks. Of course, please let us know if something will ultimately need to be changed.

rsrock commented 5 years ago

That may have fixed it. I'm hitting another error, but I think it's unrelated. I'll investigate a bit before closing this issue. Thanks.

rsrock commented 5 years ago

Confirmed, this is now fixed. Thanks!