JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

Receive "ArgumentError: Data is not in feather format" when reading dataframe written from Python #139

Open def-mycroft opened 4 years ago

def-mycroft commented 4 years ago

Hello, apologies in advance if I'm missing something simple here.

I want to write a dataframe to feather using Python and then load it into Julia. When I attempt to do this I receive an error ArgumentError: Data is not in feather format.

So, to provide a reproducible example, when I write out a dataframe in Python like this:

import pandas as pd
import feather
df = pd.read_json('{"open":{"0":443.9,"1":443.9,"2":443.97,"3":443.5,"4":443.8},"high":{"0":443.9,"1":443.9,"2":443.97,"3":443.5,"4":443.98},"low":{"0":443.9,"1":443.9,"2":443.6,"3":443.5,"4":443.8},"close":{"0":443.9,"1":443.9,"2":443.6,"3":443.5,"4":443.98},"volume":{"0":436,"1":264,"2":1122,"3":202,"4":3202}}')
feather.write_dataframe(df, 'from-py.feather')

...and then try to load it into Julia:

using Feather
df = Feather.read("from-py.feather")

...I receive:

ERROR: ArgumentError: Data is not in feather format: header = UInt8[0x41, 0x52, 0x52, 0x4f], footer = UInt8[0x52, 0x4f, 0x57, 0x31].
Stacktrace:
 [1] validatedata(::Array{UInt8,1}) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/loaddata.jl:11
 [2] #loaddata#6 at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/loaddata.jl:17 [inlined]
 [3] #loaddata at ./none:0 [inlined]
 [4] #Source#7(::Bool, ::Type, ::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:17
 [5] Type at ./none:0 [inlined]
 [6] #read#10(::Bool, ::Function, ::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:69
 [7] read(::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:69
 [8] top-level scope at none:0

Package versions etc:

ExpandingMan commented 4 years ago

This is because pyarrow now uses Feather V2, which is just the arrow IPC format written to disk (i.e. the metadata is completely different than feather V1).

I am now deep into a complete rewrite of the Arrow.jl package, which will support reading and writing Feather V2. This package will likely be moved into legacy mode and support only reading and writing Feather V1.

I have added a note to the README regarding this. I will change this when Arrow.jl is complete. I'll also make a post on the Julia discourse. It'll probably be another few weeks before I have unit tests and all and am ready for a release, but keep an eye out if you're still interested. I won't support everything in the arrow standard right out of the gate (it's quite extensive by now), but certainly simple dataframes like you show here will be supported initially.

def-mycroft commented 4 years ago

thanks for the note and work @ExpandingMan .

I'll leave this issue open for now so it is visible to others while the rewrite of Arrow.jl is in progress.

chrizMM commented 2 years ago

Any news on this issue?