JuliaData / Feather.jl

Read and write feather files in pure Julia
https://juliadata.github.io/Feather.jl/stable
Other
109 stars 27 forks source link

up DataFrames.jl dependency #143

Closed bkamins closed 3 years ago

bkamins commented 3 years ago

hopefully everything will go cleanly, as we have not changed anything that should break things.

bkamins commented 3 years ago

And a general question - why does Feather.jl depend on DataFrames.jl? I think it could use the same strategy like other packages, e.g. CSV.jl and be sink table agnostic?

ExpandingMan commented 3 years ago

Thanks, but note that with the new Arrow.jl this package is basically deprecated. Yes, it's needed for reading the legacy feather format, but it's hard for me to imagine there being much will for maintaining this package.

As I recall the DataFrames dependencies dates back to before Tables.jl, or at least before it reached maturity. It wouldn't be hard to remove it.

bkamins commented 3 years ago

Sure - just decide what you prefer here. In the worst case we can keep this PR open as a reminder for the future.

ExpandingMan commented 3 years ago

I'd be happy to merge this when tests pass, but frankly I'm no longer interested in following this package, so in the future I'm not sure I'll even notice the PR's. I think currently both @quinnj and I are the only maintainers. I'm sure we'd be happy to give you guys maintainer status if you have interest in keeping it going. I think only @quinnj can do that.

bkamins commented 3 years ago

I do not use this package either. I am just making sure packages that depend on DataFrames.jl are properly notified about version changes.

bkamins commented 3 years ago

Maybe it should just get deprecated and unmaintained officially?

ExpandingMan commented 3 years ago

That would be my preference. There may be some interest from someone in it getting maintained, because I suppose some people may have old feather files sitting around. However, even in the worst case scenario, people can use pyarrow to convert their old files to the newer format, so it's hard for me to imagine any reason for anyone to maintain this package.

jd-lara commented 3 years ago

@bkamins thanks for fixing the dependency. We will need the package for a bit while transitioning to a new file format.

bkamins commented 3 years ago

CI fails because the package is making a wrong use of CategoricalArrays.jl (which @nalimilan already noted some time ago)

nalimilan commented 3 years ago

Actually the failure is in Arrow.jl (version 0.2). Should we copy the code from that version to the Feather.jl repo since Arrow.jl now tracks the newer version of the spec? Or should we create an Arrow0.jl package or something? Fixing CategoricalArrays should be easy.

quinnj commented 3 years ago

Can't it just put an Arrow upper bounds as 0.2?

nalimilan commented 3 years ago

Well that would mean that anybody who want to read a Feather file in the old format cannot read files in the new format. And just installing one of the packages would block the other. Even if the format is deprecated I think we should support it for a long time as people can have stored data in this format or get it from elsewhere.

ExpandingMan commented 3 years ago

Again, pyarrow supports it, in the worst case scenario. Supporting it "for a long time" would of course be dependent on someone willing to do that.

quinnj commented 3 years ago

Yeah, maybe in the README we recommend that people transition from Feather.jl => Arrow.jl, which will allow them to upgrade to Arrow 1.0. I guess the bad scenario then is where someone may want Feather.jl and Arrow.jl 1.0 in the same project for some reason; possibly some kind of "ingestion engine" or something. Yeah, it'd probably be ideal if someone just copy-pasted the old Arrow code into this repo as a subdirectory and then remove the Arrow.jl dep.

quinnj commented 3 years ago

I've included the changes in this PR in https://github.com/JuliaData/Feather.jl/pull/145