JuliaStats / GLMNet.jl

Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet
Other
96 stars 35 forks source link

Fix null deviance and use `Ref` instead of `Ptr` #57

Closed devmotion closed 3 years ago

devmotion commented 3 years ago

Fixes the computation of the null deviation for the Gaussian family and test it with the output of the R package. Fixes https://github.com/JuliaStats/GLMNet.jl/issues/39.

The implementation is basically copied from the implementation in the R package in the file R/elnet.R:

### compute the null deviance
  ybar=if(intr)weighted.mean(y,weights)else 0
  nulldev=sum(weights* (y-ybar)^2)

It has to be computed before the ccall since the call modifies the weights in-place and additionally y if standardize = true. Maybe it would be good to always copy the weights and copy y if standardize = true - at least for me this behaviour was quite surprising and confusing.

Additionally, I changed the types in the ccalls according to the Julia documentation:

In Julia code wrapping calls to external Fortran routines, all input arguments should be declared as of type Ref{T}, as Fortran passes all variables by pointers to memory locations. The return type should either be Cvoid for Fortran subroutines, or a T for Fortran functions returning the type T.

The vectors of fixed length 1 in the ccall are replaced with Ref.

JackDunnNZ commented 3 years ago

Thanks for this, it looks good to me.

Maybe it would be good to always copy the weights and copy y if standardize = true - at least for me this behaviour was quite surprising and confusing.

Yes agreed this is probably not the expected behavior. Can we check if the R package takes a copy or mutates in place? There is some argument for mirroring whatever they do as closely as possible and hopefully their behavior is sensible.

devmotion commented 3 years ago

Can we check if the R package takes a copy or mutates in place?

I just ran

> fit <- glmnet(x=X, y=y, weights=weights, family="gaussian")

and X, y, and weights still contain the same values afterwards.

JackDunnNZ commented 3 years ago

X, y, and weights still contain the same values afterwards.

Great, thanks for checking that. It sounds like we should be making copies then. Feel free to either work that in here or we can merge this now and make a followup PR whenever suits, whichever you prefer

devmotion commented 3 years ago

I think it would be better to address this in a separate PR, mainly because it will also require more changes and additional tests which are not related to the original purpose of this PR.

JackDunnNZ commented 3 years ago

Sounds good, thanks!