JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

Error reading file #46

Closed davidanthoff closed 7 years ago

davidanthoff commented 7 years ago

I'm on julia 0.6 RC2 on Windows. Feather version is 0.2.5, DataStreams is 0.1.3.

I'm trying to read this file with

Feather.read("test-output.feather")

I get the following error message:

ERROR: type is immutable
Stacktrace:
 [1] setrows!(::Feather.Source, ::Int64) at C:\Users\anthoff\.julia\v0.6\DataStreams\src\DataStreams.jl:56
 [2] stream!(::Feather.Source, ::Type{DataStreams.Data.Column}, ::DataFrames.DataFrame, ::DataStreams.Data.Schema{true}, ::DataStreams.Data.Schema{true}, ::Array{Function,1}) at C:\Users\anthoff\.julia\v0.6\DataStreams\src\DataStreams.jl:239
 [3] #stream!#5(::Array{Any,1}, ::Function, ::Feather.Source, ::Type{DataFrames.DataFrame}, ::Bool, ::Dict{Int64,Function}) at C:\Users\anthoff\.julia\v0.6\DataStreams\src\DataStreams.jl:151
 [4] #read#19 at C:\Users\anthoff\.julia\v0.6\Feather\src\Feather.jl:249 [inlined]
 [5] read(::String, ::Type{DataFrames.DataFrame}) at C:\Users\anthoff\.julia\v0.6\Feather\src\Feather.jl:249
 [6] read(::String) at C:\Users\anthoff\.julia\v0.6\Feather\src\Feather.jl:243

Reading the same file on R seems to work.

ExpandingMan commented 7 years ago

I think things are a bit in limbo right now as @quinnj is working on overhauling DataStreams. Once that is done, I expect updates for 0.6 to be merged fairly quickly, which may or may not fix this issue.

davidanthoff commented 7 years ago

@quinnj Any chance you could take a look at this? This is a blocking error on julia 0.6 and it would be great to have a version of Feather that works on julia 0.6.

Having said that, I just looked into this and the bug is a real mystery to me, it almost looks like maybe a regression in julia base to me... I'm on RC3. The error always occurs on this line, in particular the statement source.schema.rows = rows throws the error. But the type of source.schema is not an immutable, so I'm really lost why that would throw an error. If this is really a julia 0.6 regression it would of course also be good if we could report a reproducible case over at base, maybe this could still be fixed before the release.

KristofferC commented 7 years ago

Works for me. 0.2.5 Feather. Datastreams 0.1.3. RC2. Linux.

julia> Feather.read("/home/kristoffer/Documents/test-output.feather")
3×3 DataFrames.DataFrame
│ Row │ name    │ age  │ children │
├─────┼─────────┼──────┼──────────┤
│ 1   │ "John"  │ 23.0 │ 3        │
│ 2   │ "Sally" │ 42.0 │ 5        │
│ 3   │ "Kirk"  │ 59.0 │ 2        │
davidanthoff commented 7 years ago

The tests actually segfault on me on Ubuntu on RC3:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.0-rc3.0 (2017-06-07 11:53 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> Pkg.status()
5 required packages:
 - DataFrames                    0.10.0
 - Feather                       0.2.5
 - Gadfly                        0.6.2
 - PyPlot                        2.3.2
 - Query                         0.5.0
68 additional packages:
 - AxisAlgorithms                0.1.6
 - BinDeps                       0.6.0
 - Calculus                      0.2.2
 - CategoricalArrays             0.1.3
 - ColorTypes                    0.5.1
 - Colors                        0.7.3
 - CommonSubexpressions          0.0.1
 - Compat                        0.26.0
 - Compose                       0.5.2
 - Conda                         0.5.3
 - Contour                       0.3.0
 - CoupledFields                 0.0.1
 - DataArrays                    0.5.3
 - DataStreams                   0.1.3
 - DataStructures                0.5.3
 - DataValues                    0.1.0
 - DiffBase                      0.2.0
 - Distances                     0.4.1
 - Distributions                 0.13.0
 - DocStringExtensions           0.3.3
 - Documenter                    0.11.1
 - DualNumbers                   0.3.0
 - FileIO                        0.3.1
 - FixedPointNumbers             0.3.8
 - FlatBuffers                   0.2.0
 - ForwardDiff                   0.5.0
 - GZip                          0.3.0
 - Hexagons                      0.1.0
 - Hiccup                        0.1.1
 - Interpolations                0.6.2
 - IterableTables                0.2.0
 - Iterators                     0.3.1
 - JSON                          0.12.0
 - Juno                          0.2.7
 - KernelDensity                 0.3.2
 - LaTeXStrings                  0.2.1
 - LineSearches                  2.1.1
 - Loess                         0.2.0
 - MacroTools                    0.3.6
 - Measures                      0.1.0
 - Media                         0.2.7
 - NLSolversBase                 2.1.3
 - NaNMath                       0.2.5
 - NamedTuples                   3.0.2
 - NodeJS                        0.0.1              master
 - NullableArrays                0.1.1
 - Optim                         0.9.1
 - PDMats                        0.6.0
 - Parameters                    0.7.2
 - PositiveFactorizations        0.0.4
 - PyCall                        1.12.0
 - QuadGK                        0.1.2
 - Ratios                        0.1.0
 - RealInterface                 0.0.2
 - Reexport                      0.0.3
 - Requires                      0.4.3
 - Rmath                         0.1.6
 - SHA                           0.3.3
 - ShowItLikeYouBuildIt          0.0.1
 - Showoff                       0.1.1
 - SortingAlgorithms             0.1.1
 - SpecialFunctions              0.1.1
 - StaticArrays                  0.5.1
 - StatsBase                     0.16.0
 - StatsFuns                     0.5.0
 - URIParser                     0.1.8
 - WeakRefStrings                0.2.0
 - WoodburyMatrices              0.2.2

julia> Pkg.test("Feather")
INFO: Computing test dependencies for Feather...
INFO: Installing DataStreamsIntegrationTests v0.0.2
INFO: Testing Feather
WARNING: Method definition ==(Base.Nullable{S}, Base.Nullable{T}) in module Base at nullable.jl:238 overwritten in module NullableArrays at /home/davidanthoff/.julia/v0.6/NullableArrays/src/operators.jl:128.
WARNING: This Feather file is old and will not be readable beyond the 0.3.0 release

signal (11): Segmentation fault
while loading /home/davidanthoff/.julia/v0.6/Feather/test/runtests.jl, in expression starting on line 17
julia_type_to_llvm at /home/davidanthoff/source/julia-0.6/src/cgutils.cpp:382
typed_store at /home/davidanthoff/source/julia-0.6/src/cgutils.cpp:1226
emit_setfield at /home/davidanthoff/source/julia-0.6/src/cgutils.cpp:2254
emit_builtin_call at /home/davidanthoff/source/julia-0.6/src/codegen.cpp:3073
emit_call at /home/davidanthoff/source/julia-0.6/src/codegen.cpp:3441
emit_expr at /home/davidanthoff/source/julia-0.6/src/codegen.cpp:4139
emit_stmtpos at /home/davidanthoff/source/julia-0.6/src/codegen.cpp:4058 [inlined]
emit_function at /home/davidanthoff/source/julia-0.6/src/codegen.cpp:6242
jl_compile_linfo at /home/davidanthoff/source/julia-0.6/src/codegen.cpp:1256
jl_compile_for_dispatch at /home/davidanthoff/source/julia-0.6/src/gf.c:1672
jl_compile_method_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:307 [inlined]
jl_call_method_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:354 [inlined]
jl_apply_generic at /home/davidanthoff/source/julia-0.6/src/gf.c:1930
stream! at /home/davidanthoff/.julia/v0.6/DataStreams/src/DataStreams.jl:239
unknown function (ip: 0x7f0ffb32784a)
jl_call_fptr_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/davidanthoff/source/julia-0.6/src/gf.c:1930
#stream!#5 at /home/davidanthoff/.julia/v0.6/DataStreams/src/DataStreams.jl:151
jl_call_fptr_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/davidanthoff/source/julia-0.6/src/gf.c:1930
jl_apply at /home/davidanthoff/source/julia-0.6/src/julia.h:1423 [inlined]
jl_invoke at /home/davidanthoff/source/julia-0.6/src/gf.c:51
macro expansion at /home/davidanthoff/.julia/v0.6/Feather/test/runtests.jl:19 [inlined]
anonymous at ./<missing> (unknown line)
jl_call_fptr_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:358 [inlined]
jl_toplevel_eval_flex at /home/davidanthoff/source/julia-0.6/src/toplevel.c:589
jl_parse_eval_all at /home/davidanthoff/source/julia-0.6/src/ast.c:873
jl_load at /home/davidanthoff/source/julia-0.6/src/toplevel.c:616
include_from_node1 at ./loading.jl:569
unknown function (ip: 0x7f100d49fb0b)
jl_call_fptr_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/davidanthoff/source/julia-0.6/src/gf.c:1930
include at ./sysimg.jl:14
unknown function (ip: 0x7f100d34316b)
jl_call_fptr_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/davidanthoff/source/julia-0.6/src/gf.c:1930
process_options at ./client.jl:305
_start at ./client.jl:371
unknown function (ip: 0x7f100d4ab658)
jl_call_fptr_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/davidanthoff/source/julia-0.6/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/davidanthoff/source/julia-0.6/src/gf.c:1930
jl_apply at /home/davidanthoff/source/julia-0.6/ui/../src/julia.h:1423 [inlined]
true_main at /home/davidanthoff/source/julia-0.6/ui/repl.c:127
main at /home/davidanthoff/source/julia-0.6/ui/repl.c:264
__libc_start_main at /build/glibc-9tT8Do/glibc-2.23/csu/../csu/libc-start.c:291
unknown function (ip: 0x401668)
Allocations: 6771535 (Pool: 6770143; Big: 1392); GC: 8
=============================================================[ ERROR: Feather ]=============================================================

failed process: Process(`/home/davidanthoff/source/julia-0.6/usr/bin/julia -Cnative -J/home/davidanthoff/source/julia-0.6/usr/lib/julia/sys.so --compile=yes --depwarn=yes --check-bounds=yes --code-coverage=none --color=yes --compilecache=yes /home/davidanthoff/.julia/v0.6/Feather/test/runtests.jl`, ProcessSignaled(11)) [0]

============================================================================================================================================
INFO: Removing DataStreamsIntegrationTests v0.0.2
ERROR: Feather had test errors
KristofferC commented 7 years ago

Yeah, I also segfault on rc2 Pkg.test(). But seems to be in FlatBuffers for me:

signal (11): Segmentation fault
while loading /home/kristoffer/.julia/v0.6/Feather/test/runtests.jl, in expression starting on line 17
jl_gc_pool_alloc at /home/centos/buildbot/slave/package_tarball64/build/src/gc.c:927
get at /home/kristoffer/.julia/v0.6/FlatBuffers/src/internals.jl:8
offset at /home/kristoffer/.julia/v0.6/FlatBuffers/src/internals.jl:18
read at /home/kristoffer/.julia/v0.6/FlatBuffers/src/FlatBuffers.jl:225
getvalue at /home/kristoffer/.julia/v0.6/FlatBuffers/src/FlatBuffers.jl:208
getarray at /home/kristoffer/.julia/v0.6/FlatBuffers/src/FlatBuffers.jl:178
getvalue at /home/kristoffer/.julia/v0.6/FlatBuffers/src/FlatBuffers.jl:188
unknown function (ip: 0x7f636d24f45d)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:337 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:356 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1930
quinnj commented 7 years ago

I wonder if it's https://github.com/JuliaLang/julia/issues/22256#issuecomment-307164135

davidanthoff commented 7 years ago

When I run the test suite on Ubuntu with this julia branch JuliaLang/julia#22282 it no longer segfaults but produces this error:

ERROR: LoadError: ArgumentError: unsafe_wrap: pointer 0x7fe8160d94fc is not properly aligned to 8 bytes
Stacktrace:
 [1] unwrap(::Feather.Source, ::Type{Float64}, ::Int64, ::Int64, ::Int64) at /home/davidanthoff/.julia/v0.7/Feather/src/Feather.jl:142
 [2] unwrap(::Feather.Source, ::Type{Float64}, ::Int64, ::Int64) at /home/davidanthoff/.julia/v0.7/Feather/src/Feather.jl:141
 [3] streamfrom(::Feather.Source, ::Type{DataStreams.Data.Column}, ::Type{NullableArrays.NullableArray{Float64,1}}, ::Int64) at /home/davidanthoff/.julia/v0.7/Feather/src/Feather.jl:161
 [4] streamto!(::DataFrames.DataFrame, ::Type{DataStreams.Data.Column}, ::Feather.Source, ::Type{NullableArrays.NullableArray{Float64,1}}, ::Type{NullableArrays.NullableArray{Float64,1}}, ::Int64, ::Int64, ::DataStreams.Data.Schema{true}, ::Base.#identity) at /home/davidanthoff/.julia/v0.7/DataStreams/src/DataStreams.jl:218
 [5] stream!(::Feather.Source, ::Type{DataStreams.Data.Column}, ::DataFrames.DataFrame, ::DataStreams.Data.Schema{true}, ::DataStreams.Data.Schema{true}, ::Array{Function,1}) at /home/davidanthoff/.julia/v0.7/DataStreams/src/DataStreams.jl:231
 [6] #stream!#5(::Array{Any,1}, ::Function, ::Feather.Source, ::Type{DataFrames.DataFrame}, ::Bool, ::Dict{Int64,Function}) at /home/davidanthoff/.julia/v0.7/DataStreams/src/DataStreams.jl:151
 [7] macro expansion at /home/davidanthoff/.julia/v0.7/Feather/test/runtests.jl:19 [inlined]
 [8] anonymous at ./<missing>:?
 [9] include_from_node1(::Module, ::String) at ./loading.jl:549
 [10] include(::Module, ::String) at ./sysimg.jl:14
 [11] process_options(::Base.JLOptions) at ./client.jl:307
 [12] _start() at ./client.jl:375
while loading /home/davidanthoff/.julia/v0.7/Feather/test/runtests.jl, in expression starting on line 17

But in general these segfault errors seem to happen in very different parts of the code than the bug I reported originally in this issue, which I can only reproduce on Windows and which seems to happening in the setrows! function.

So maybe we'll keep this issue here on the setrows! bug, and then we should probably separately figure out what the problem is with the segfaults in the general test suite.

vtjnash commented 7 years ago

That's not a Segmentation fault, it's an ArgumentError. Fixes for it are described in https://github.com/JuliaLang/julia/pull/21831

galenlynch commented 7 years ago

Ah ok! Thanks for the link.

davidanthoff commented 7 years ago

This seems fixed on julia 0.6 final.