JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

Strange behavior on Windows #112

Open Rudi79 opened 5 years ago

Rudi79 commented 5 years ago

Hi, executing the following code in Windows works

 using DataFrames, Feather
 dt = DataFrame(A = rand(10), B = rand(10))
 Feather.write("test.feather",dt)

The resulting file can be loaded in the Windows version of julia, but if I try to load the file in R, R crashes. The same code running under Linux results in a file that the same R is able to read just fine. I am using Windows 10 and Julia 1.1.0. I've tried Feather 0.4.0 and 0.5.1. I am not sure if this is the right place to submit, because Python is able to read the Feather file generated by the Windows version correctly.

ExpandingMan commented 5 years ago

I recall that there were some bizarre quirks in Windows, but I don't recall anything relevant to this, and to be honest, my own ability to test on Windows is basically non-existent, so I need to leave Windows issues to other maintainers.

The fact that R can read the file created in linux but not windows indicates that Feather.jl is somehow producing different files depending on operating system, which it seems to me should not be the case. Therefore this does seem to be a Feather.jl bug, however the fact that Python can read the file just fine might suggest that this bug is actually producing valid files but perhaps in a way that is not supported by the R feather package.

Rudi79 commented 5 years ago

Thanks for the quick reply. I understand that testing is a nightmare: different implementations across different OS. I thought that documenting the issue here might still be worthwile.
And you are right that linux and win are producing different files. At least they differ in filesize.

magerton commented 5 years ago

+1 for this -- I've been having the same issue: reading a Linux-generated .feather file using R on Windows causes R to fail.

magerton commented 5 years ago

FWIW, here are a couple of work-arounds

using DataFrames
using RCall
using RData
using Test

df = DataFrame(
    a = collect(1:3),
    b = [ "a", "b", "c",]
)

FILEPATH = "jnk.rda"

@rput df FILEPATH
R"""
save(df, file=FILEPATH)
"""

df2 = load(FILEPATH)

@test df == df2["df"]
using DataFrames
using CSV
using GZip
using Test

df = DataFrame(
    a = collect(1:3),
    b = [ "a", "b", "c",]
)

GZFILEPATH = "jnk.csv.gz"

fh = GZip.gzopen(GZFILEPATH, "w")
CSV.write(fh, df)
GZip.close(fh)

fh2 = GZip.gzopen(GZFILEPATH, "r")
df2 = CSV.File(fh2) |> DataFrame
GZip.close(fh2)

@testset "are results the same?" begin
    @test all(names(df) .== names(df2))
    @test df == df2
end

rm(GZFILEPATH)
Rudi79 commented 5 years ago

The problem seems related to https://github.com/JuliaData/FlatBuffers.jl/issues/38 At least fixing FlatBuffers at 0.4.0 solves the problem.