JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

explicitly setting offsets types, how to support `Int64` offsets within Feather format #81

Closed ExpandingMan closed 6 years ago

ExpandingMan commented 6 years ago

This is for the Arrow PR.

Ok, finally more or less figured this out.

Pretty sure this just happened because I'm writing strings with wrong offsets types (defaults to Int64 in Arrow right now). I'm starting to question whether Int64 is a reasonable default for the offsets. Anyway, should be fairly straightforward to fix, but for the moment it's creating scary corrupted files. Not sure why tests didn't catch, we need to fix that.

ExpandingMan commented 6 years ago

Ok, in Arrow I've now defined DefaultOffset which makes it easy to change what the default offset type is. I've also gone and reverted it to Int32 for now, which should bring some sanity to Feather which doesn't seem to support anything else.

I'm not sure what the long term solution is, probably just have to define explicit Data.streamto! methods in the presence of strings (i.e. rather than just using arrowformat) in those cases. The ugliness of that offends me for some reason, but really it's no big deal. I may decide to overhaul Locate to be a format specifier that works for both reading and writing.

ExpandingMan commented 6 years ago

According to @wesm in #75 we really should only be using Int32. Therefore I am inclined to just leave that as the default and take no further action. Should probably make sure there are enough default constructors in Arrow.jl, even though the last thing that package needs is more constructors 😨 .