JuliaData / DataTables.jl

(DEPRECATED) A rewrite of DataFrames.jl based on Nullable
Other
29 stars 11 forks source link

Overwrites DataFrames describe function #33

Open davidanthoff opened 7 years ago

davidanthoff commented 7 years ago

I have a lot of situations where I need both DataFrames and DataTables loaded at the same time, e.g. I start out with:

using DataFrames, DataTables

Right now I always get a warning that DataTables overwrites describe from DataFrames, which is not ideal.

I guess the solution for this is to move the function definition in some common base package, and then both DataFrames and DataTables will add a method? Would that be AbstractTables? If so, could we maybe start with a really bare bones AbstractTables now, that only holds that one definition, and then later more stuff can be added?

ararslan commented 7 years ago

Ideally describe would be removed from one or both packages, as it's more of a statistical function than a tabular data function. Maybe that could live in StatsModels at some point?

kleinschmidt commented 7 years ago

It's from StatsBase, right? https://github.com/JuliaData/DataTables.jl/blob/20c71d6d40b3b238e902189b8262ba2b2e679b31/src/abstractdatatable/abstractdatatable.jl#L373

ararslan commented 7 years ago

Yes, but unless we want a dependency on AbstractTables in StatsBase (which I don't think we should do), we'd still have to define the generic describe method on tables elsewhere. That's why I suggested StatsModels.

kleinschmidt commented 7 years ago

I'm confused: why does using DataFrames and DataTables result in one's describe overwriting the other if they're both extending the method from StatsBase?

ararslan commented 7 years ago

Ohhhhhhhhhhhhhhhhhhhhhhhh heh, DataFrames and DataTables both @reexport StatsBase. I bet that's it.

davidanthoff commented 7 years ago

Both have this:

StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv)

That is the first of three overwriting messages I'm getting.

davidanthoff commented 7 years ago

And then there is:

function StatsBase.describe{T<:Number}(io, dv::AbstractArray{T})
function StatsBase.describe{T}(io, dv::AbstractArray{T})

in both. I guess those three methods should just move to StatsBase, right?

ararslan commented 7 years ago

Assuming they don't contain code specific to Nullables and/or NA, yes, those methods should live in StatsBase. Good catch!

davidanthoff commented 7 years ago

Well, they actually contain code that is Nullable and DataArray specific :) So I guess they really should dispatch on fewer types?

kleinschmidt commented 7 years ago

Maybe replace those abstract array methods with an non-exported method for single columns?

davidanthoff commented 7 years ago

I think StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv) should just move to StatsBase as is.

A version of function StatsBase.describe{T<:Number}(io, nv::AbstractArray{T}) that doesn't handle missing values should also move to StatsBase. In DataTables there should be function StatsBase.describe{T<:Number}(io, nv::NullableArray{T}), and in DataFrames function StatsBase.describe{T<:Number}(io, nv::DataArray{T}).

For function StatsBase.describe{T}(io, nv::AbstractArray{T}) similar story.