JuliaData / DataTables.jl

(DEPRECATED) A rewrite of DataFrames.jl based on Nullable
Other
29 stars 11 forks source link

Backporting to DataFrames #81

Closed cjprybol closed 6 years ago

cjprybol commented 7 years ago

Is there any plan for the best way to accomplish this or any recommended strategies to try first? Would it be preferable to get Nulls working here before the backport or get started now and move the all open PRs here (https://github.com/JuliaData/DataTables.jl/pull/66 included) to DataFrames and finish them there?

ararslan commented 7 years ago

Is there any plan for the best way to accomplish this or any recommended strategies to try first?

I think just cherry picking commits then editing things by hand. Kind of sucks but I'm not sure what other option there is.

Would it be preferable to get Nulls working here before the backport

Probably not IMO since the DataArrays representation of missingness is much closer to that of Nulls so the porting should probably be easier over there than here.

quinnj commented 7 years ago

I actually got almost all the way w/ my PR: https://github.com/JuliaData/DataTables.jl/pull/66 (dropping DataArrays/NullableArrays entirely). We could revive that and merge it, then backport to DataFrames.

nalimilan commented 7 years ago

I've spent quite some time rebasing the DataTables git history to remove any trace of the DataFrame->DataTable rename. It's available as a branch in DataFrames. So once we have merged #66 here, it should be easy to backport everything to DataFrames.

ararslan commented 7 years ago

@nalimilan You're a saint!

davidanthoff commented 7 years ago

Are there any plans what will happen with DataTables? I've been working on a branch of DataTables that uses DataValue instead of Nullable and once that is ready I'll want to make that work available in some form as a package. One option would be to just release it as DataValueTables. On the flipside, if the current plan for DataTables is that it will just go away once things are merged back into DataFrames, we could repurpose DataTables to continue to exist as a table that is based on DataValue, i.e. it would continue to be a table implementation that uses a container based approach for missing data, but it would be (hopefully) much more usable than the Nullable based approach we have right now.

I'm not entirely sure where I'm going with that work. In the short term (i.e. julia 0.6 timeframe) I just want to have a table type that is fast and has good usability, so that is the short term goal. Medium term, I mostly see it as a hedge: if the whole Union{T,Null} thing works out for julia 1.0, it will probably just go away. But if not, we would still have something fast and easy to use for julia 1.0.

nalimilan commented 7 years ago

@davidanthoff Let's discuss this in another issue. One problem for the future of DataTables is that CategoricalArrays are going to switch to Union{T, Null} too, so DataTables will be stuck with the current CategoricalArrays version.

cjprybol commented 6 years ago

https://github.com/JuliaData/DataFrames.jl/pull/1220