Open davidanthoff opened 7 years ago
Ideally describe
would be removed from one or both packages, as it's more of a statistical function than a tabular data function. Maybe that could live in StatsModels at some point?
Yes, but unless we want a dependency on AbstractTables in StatsBase (which I don't think we should do), we'd still have to define the generic describe
method on tables elsewhere. That's why I suggested StatsModels.
I'm confused: why does using DataFrames and DataTables result in one's describe overwriting the other if they're both extending the method from StatsBase?
Ohhhhhhhhhhhhhhhhhhhhhhhh heh, DataFrames and DataTables both @reexport
StatsBase. I bet that's it.
Both have this:
StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv)
That is the first of three overwriting messages I'm getting.
And then there is:
function StatsBase.describe{T<:Number}(io, dv::AbstractArray{T})
function StatsBase.describe{T}(io, dv::AbstractArray{T})
in both. I guess those three methods should just move to StatsBase
, right?
Assuming they don't contain code specific to Nullable
s and/or NA
, yes, those methods should live in StatsBase. Good catch!
Well, they actually contain code that is Nullable
and DataArray
specific :) So I guess they really should dispatch on fewer types?
Maybe replace those abstract array methods with an non-exported method for single columns?
I think StatsBase.describe(nv::AbstractArray) = describe(STDOUT, nv)
should just move to StatsBase as is.
A version of function StatsBase.describe{T<:Number}(io, nv::AbstractArray{T})
that doesn't handle missing values should also move to StatsBase. In DataTables there should be function StatsBase.describe{T<:Number}(io, nv::NullableArray{T})
, and in DataFrames function StatsBase.describe{T<:Number}(io, nv::DataArray{T})
.
For function StatsBase.describe{T}(io, nv::AbstractArray{T})
similar story.
I have a lot of situations where I need both
DataFrames
andDataTables
loaded at the same time, e.g. I start out with:Right now I always get a warning that
DataTables
overwritesdescribe
fromDataFrames
, which is not ideal.I guess the solution for this is to move the function definition in some common base package, and then both
DataFrames
andDataTables
will add a method? Would that beAbstractTables
? If so, could we maybe start with a really bare bonesAbstractTables
now, that only holds that one definition, and then later more stuff can be added?