JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

Segfault with integer categorical data #76

Closed cstjean closed 6 years ago

cstjean commented 6 years ago

Two issues. First, writing a CategoricalArray{Int64,1,UInt32} is unsupported:

julia> df = DataFrame(X=CategoricalArrays.CategoricalArray([1,1,1]))
3×1 DataFrames.DataFrame
│ Row │ X │
├─────┼───┤
│ 1   │ 1 │
│ 2   │ 1 │
│ 3   │ 1 │

julia> Feather.write("/home/cedric/test2.feather", df)
ERROR: MethodError: no method matching write(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::CategoricalArrays.CategoricalValue{Int64,UInt32})
Closest candidates are:
  write(::IO, ::Any) at io.jl:284
  write(::IO, ::Any...) at io.jl:286
  write(::IO, ::Complex) at complex.jl:175

Second, reading an integer categorical array is a segfault, on 0.6.2 + Ubuntu + Feather 0.3.1

import pandas as pd
dd = pd.DataFrame(pd.Series([1,1,1], name="X", dtype="category"))
dd.to_feather("test.feather")
julia> import Feather

julia> df = Feather.read("test.feather")

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
unknown function (ip: 0x7fddc0e28d9c)
jl_pchar_to_string at /buildworker/worker/package_linux64/build/src/array.c:415
#4 at /home/cedric/.julia/v0.6/Feather/src/Feather.jl:78
_collect at ./array.jl:488
unknown function (ip: 0x7fdd9f04f356)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
addlevels! at /home/cedric/.julia/v0.6/Feather/src/Feather.jl:78
unknown function (ip: 0x7fdd9f04daab)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
#Source#7 at /home/cedric/.julia/v0.6/Feather/src/Feather.jl:135
unknown function (ip: 0x7fdd9f042a0b)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1424 [inlined]
jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:51
Type at ./<missing>:0
#read#27 at /home/cedric/.julia/v0.6/Feather/src/Feather.jl:289 [inlined]
read at /home/cedric/.julia/v0.6/Feather/src/Feather.jl:289
read at /home/cedric/.julia/v0.6/Feather/src/Feather.jl:284
unknown function (ip: 0x7fdd9f040952)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:75
eval at /buildworker/worker/package_linux64/build/src/interpreter.c:242
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:543
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:692
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:592
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/builtins.c:496
eval at ./boot.jl:235
unknown function (ip: 0x7fddbb775d2f)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
eval_user_input at ./REPL.jl:66
unknown function (ip: 0x7fddbb7f718f)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
macro expansion at ./REPL.jl:97 [inlined]
#1 at ./event.jl:73
unknown function (ip: 0x7fdd9f02f34f)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1424 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:267
unknown function (ip: 0xffffffffffffffff)
Allocations: 5696111 (Pool: 5694739; Big: 1372); GC: 9
Segmentation fault (core dumped)

Pandas reads it just fine.

ExpandingMan commented 6 years ago

Just so you are aware, I'm in the process of rewriting this package so that it is completely safe. Still in the process of figuring out whether this is process with equivalent performance, but so far it's looking fairly close.

ExpandingMan commented 6 years ago

The package has been completely overhauled in master. Whatever was the original cause of this issue is now gone. Please re-open if you are still experiencing it on master.