JuliaStats / StatsModels.jl

Specifying, fitting, and evaluating statistical models in Julia
251 stars 31 forks source link

Why ContrastsMatrix matrix is Matrix{Float64}? #251

Open PharmCat opened 2 years ago

PharmCat commented 2 years ago

Why matrix field of struct ContrastsMatrix is Matrix{Float64}? For many cases fo DummyCoding() or FullDummyCoding() this can be BitMatrix or SparseMatrixCSC{Bool, Int64}. For big datasets I try to make something like this:

mutable struct OwnDummyCoding <: AbstractContrasts
# Dummy contrasts 
end
function StatsModels.contrasts_matrix(C::OwnDummyCoding, baseind, n)
    sparse(I, n, n)[:, [1:(baseind-1); (baseind+1):n]]
end

But I have memory overflow because ContrastsMatrix tries to convert this to Matrix{Float64}.

PharmCat commented 2 years ago

Is it possible to make:

struct ContrastsMatrix{C <: AbstractContrasts, T, U, M}
    matrix::M
    termnames::Vector{U}
    levels::Vector{T}
    contrasts::C
    invindex::Dict{T,Int}
    function ContrastsMatrix(matrix::M,
                             termnames::Vector{U},
                             levels::Vector{T},
                             contrasts::C) where {U,T,C <: AbstractContrasts} where M <: AbstractMatrix
        allunique(levels) || throw(ArgumentError("levels must be all unique, got $(levels)"))
        invindex = Dict{T,Int}(x=>i for (i,x) in enumerate(levels))
        new{C,T,U,M}(matrix, termnames, levels, contrasts, invindex)
    end
end
palday commented 2 years ago

@PharmCat how many contrast levels do you have? If this is for the grouping variable in MixedModels.jl, then there is the Grouping() pseudocontrast which avoids creating an actual matrix

PharmCat commented 2 years ago

@PharmCat how many contrast levels do you have? If this is for the grouping variable in MixedModels.jl, then there is the Grouping() pseudocontrast which avoids creating an actual matrix

@palday

Hello! It can be more than 10^5. Actually I'am working on Metida.jl, that helps me in some tasks where MixedModels.jl can't be used. I know that in MixedModels this problem solved, Metida have some "workaround" too. And I see 'Grouping' in MixedModels.jl and may be 'Grouping' code should be moved to StatsModels.jl and documented there (may be with some other code from MixedModels, such using "/" in terms). Also I don't know why ContrastsMatrix matrix field set as Matrix{Float64}, why in can't be more flexible.

So also I can't find any roadmap for StatsModels, I think StatsModels is a core package for JuliaStats ecosystem, but have no information about it's development plan to version 1.0

palday commented 2 years ago

The nesting syntax / is implemented in RegressionFormulae.jl

palday commented 2 years ago

The implementation of Grouping() is quite simple: https://github.com/JuliaStats/MixedModels.jl/blob/621f88b1f594ea0827d9ac7e8628113dd2121bef/src/grouping.jl#L2-L34

Depending on the exact structure of your model, you might be able to skip using the full formula infrastructure and instead call a custom modelcols method directly -- this is how random effects and associated sparse matrices are constructed in MixedModels.

PharmCat commented 2 years ago

The implementation of Grouping() is quite simple:

Yep, but this means that I should copy this code or include MixedModels as a dependency. Maybe place this functionality in StatsModels?

palday commented 2 years ago

There's nothing wrong with copying this code, but maybe @kleinschmidt has thoughts on whether it makes more general sense to include this in StatsModels?