JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

Failed to read a feather file saved by Python `feather.write_dataframe` #140

Closed BoPeng closed 4 years ago

BoPeng commented 4 years ago

https://github.com/vatlab/sos-julia/issues/20

To reproduce the problem

  1. save a pandas DataFrame in Python as follows
import pandas
import feather
df = pandas.DataFrame([[11, 22], [22, 33], [33, 44]])
feather.write_dataframe(df, 'test.feather')
feather.read_dataframe('test.feather')

As you can see, the file can be loaded correctly in Python.

  1. From Julia, on mac osx, read the file is OK
julia> Feather.read("test.feather")
3×2 DataFrames.DataFrame
│ Row │ 0     │ 1     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 11    │ 22    │
│ 2   │ 22    │ 33    │
│ 3   │ 33    │ 44    │

However, under CentOS 7 (Julia 1.4.1), with the same process, Feathere.read produces the following error message

ArgumentError: Data is not in feather format: header = UInt8[0x41, 0x52, 0x52, 0x4f], footer = UInt8[0x52, 0x4f, 0x57, 0x31].

Stacktrace:
 [1] validatedata(::Array{UInt8,1}) at /home/bpeng1/.julia/packages/Feather/pbm3o/src/loaddata.jl:11
 [2] #loaddata#3 at /home/bpeng1/.julia/packages/Feather/pbm3o/src/loaddata.jl:17 [inlined]
 [3] loaddata at /home/bpeng1/.julia/packages/Feather/pbm3o/src/loaddata.jl:17 [inlined]
 [4] #loaddata#6 at /home/bpeng1/.julia/packages/Feather/pbm3o/src/loaddata.jl:23 [inlined]
 [5] Feather.Source(::String; use_mmap::Bool) at /home/bpeng1/.julia/packages/Feather/pbm3o/src/source.jl:17
 [6] read(::String; use_mmap::Bool) at /home/bpeng1/.julia/packages/Feather/pbm3o/src/source.jl:69
 [7] read(::String) at /home/bpeng1/.julia/packages/Feather/pbm3o/src/source.jl:69
 [8] top-level scope at In[10]:2

Edit: It seems that the files saved by pandas are different.

test.txt

test_from_centos.feather.txt

On both systems,, I am using a conda environment with pandas 1.0.3 and feather-format 0.4.1.

BoPeng commented 4 years ago

duplicate of #139

Obviously the reason is that on centos the file format is Feather v2, namely ARROW format.