JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.74k stars 370 forks source link

Can we remove src/RDA.jl? #955

Closed dmbates closed 8 years ago

dmbates commented 8 years ago

This code to read a subset of the .rda (R data archive) format was used to import saved R data.frame objects. The RCall package is much better at this. I suggest deprecating read_rda while doing the conversion for julia v0.5 I believe this will also remove the dependency on the Gzip package.

nalimilan commented 8 years ago

I was wondering about this too. +1

garborg commented 8 years ago

I'm for this, too.

The only downside I see is that in RDatasets.jl, DataFrames with legitimate categorical columns (not just because strings were often pooled in R) are stored as .rda files.

Maybe move the file into RDatasets.jl in the short term and move those files to feather or .jld longer term? c.c. @johnmyleswhite

nalimilan commented 8 years ago

At the very least, we could move read_rda to RDatasets waiting for a larger refactoring.

johnmyleswhite commented 8 years ago

+1

nalimilan commented 8 years ago

See also https://github.com/JuliaStats/DataFrames.jl/issues/958, which moving to RCall.jl would likely fix.

ararslan commented 8 years ago

@alyst What's the state of your RData package? Is it such that we could remove the RDA stuff in here entirely and direct users to your package?

For the others, see https://github.com/alyst/RData.jl

alyst commented 8 years ago

@ararslan I'm using it regularly and it works for me. It's an extended version of DataFrames.read_rda() that supports all R data types (so it does not crash on an arbitrary RData file, although not every R type is converted to Julia counterpart). So, yes, I think you can recommend it to the users. I suspect the first users might run into issues, because RData received limited testing so far, but I will try to fix the bugs as they are discovered.

nalimilan commented 8 years ago

Great. So let's deprecate RDA.jl ASAP in favor of RData.jl. A good stress test for it is porting RDatasets.jl and checking that all data sets can be loaded correctly.

ararslan commented 8 years ago

First RData needs to be registered in METADATA. Should also set up Travis and coverage.

ararslan commented 8 years ago

@alyst If you want you could transfer ownership of RData.jl to the JuliaStats organization for more rapid development and maintenance, but that's totally up to you.

alyst commented 8 years ago

@ararslan If you think that making RData a part of JuliaStats would be beneficial for the community, I am all for it. Is there anything required from my side?

ararslan commented 8 years ago

It would make sense to me to have it under the JuliaStats umbrella but I'd be interested to hear what others think. On your end you'd have to go to the repo settings and transfer ownership.

ararslan commented 8 years ago

Bump. @alyst, are you still considering transferring ownership of RData.jl to JuliaStats?

alyst commented 8 years ago

@ararslan Yes I am, whenever JuliaStats "stakeholders" decide it's worth it.

ararslan commented 8 years ago

Sounds like myself (though I'm hardly what you could call a "stakeholder") and @nalimilan (based on his thumbs up) think it's worth it. What do the other folks think? @johnmyleswhite?

quinnj commented 8 years ago

+1

ararslan commented 8 years ago

@alyst By show of thumbs (and Jacob), it looks like people are thinking it's worth it. 😄

alyst commented 8 years ago

@ararslan Great! I've tried to transfer RData.jl to JuliaStats, but got an error saying that I'm not an admin of JuliaStats to make the transfer. I've never transferred repos before, so I don't know whether JuliaStats admins have received notifications about pending transfer and can complete it, or there is no other way than to make me an admin temporarily.

johnmyleswhite commented 8 years ago

@alyst: You have the necessary permissions now.

alyst commented 8 years ago

I've just transferred the repo. Have fun!

ararslan commented 8 years ago

Thanks @alyst!

ararslan commented 8 years ago

Can an owner set the RData permissions so that JuliaStats members have our usual permissions? When doing transfers on GitHub the permissions are copied over too, so regular members don't currently have commit access.

alyst commented 8 years ago

@ararslan I've just added the same permissions as DataFrames.jl

ararslan commented 8 years ago

For some reason I'm not able to enable Travis CI on it. I've been able to do it with other repos... not sure what the issue is.

alyst commented 8 years ago

@ararslan TravisCI needs to be enabled in the settings (currently it's not), but since it requires some configuration, it's better that the more experienced admins set it up.

alyst commented 8 years ago

RData package is now registered