Closed bkamins closed 3 years ago
And a general question - why does Feather.jl depend on DataFrames.jl? I think it could use the same strategy like other packages, e.g. CSV.jl and be sink table agnostic?
Thanks, but note that with the new Arrow.jl this package is basically deprecated. Yes, it's needed for reading the legacy feather format, but it's hard for me to imagine there being much will for maintaining this package.
As I recall the DataFrames dependencies dates back to before Tables.jl, or at least before it reached maturity. It wouldn't be hard to remove it.
Sure - just decide what you prefer here. In the worst case we can keep this PR open as a reminder for the future.
I'd be happy to merge this when tests pass, but frankly I'm no longer interested in following this package, so in the future I'm not sure I'll even notice the PR's. I think currently both @quinnj and I are the only maintainers. I'm sure we'd be happy to give you guys maintainer status if you have interest in keeping it going. I think only @quinnj can do that.
I do not use this package either. I am just making sure packages that depend on DataFrames.jl are properly notified about version changes.
Maybe it should just get deprecated and unmaintained officially?
That would be my preference. There may be some interest from someone in it getting maintained, because I suppose some people may have old feather files sitting around. However, even in the worst case scenario, people can use pyarrow to convert their old files to the newer format, so it's hard for me to imagine any reason for anyone to maintain this package.
@bkamins thanks for fixing the dependency. We will need the package for a bit while transitioning to a new file format.
CI fails because the package is making a wrong use of CategoricalArrays.jl (which @nalimilan already noted some time ago)
Actually the failure is in Arrow.jl (version 0.2). Should we copy the code from that version to the Feather.jl repo since Arrow.jl now tracks the newer version of the spec? Or should we create an Arrow0.jl package or something? Fixing CategoricalArrays should be easy.
Can't it just put an Arrow upper bounds as 0.2?
Well that would mean that anybody who want to read a Feather file in the old format cannot read files in the new format. And just installing one of the packages would block the other. Even if the format is deprecated I think we should support it for a long time as people can have stored data in this format or get it from elsewhere.
Again, pyarrow
supports it, in the worst case scenario. Supporting it "for a long time" would of course be dependent on someone willing to do that.
Yeah, maybe in the README we recommend that people transition from Feather.jl => Arrow.jl, which will allow them to upgrade to Arrow 1.0. I guess the bad scenario then is where someone may want Feather.jl and Arrow.jl 1.0 in the same project for some reason; possibly some kind of "ingestion engine" or something. Yeah, it'd probably be ideal if someone just copy-pasted the old Arrow code into this repo as a subdirectory and then remove the Arrow.jl dep.
I've included the changes in this PR in https://github.com/JuliaData/Feather.jl/pull/145
hopefully everything will go cleanly, as we have not changed anything that should break things.