Closed rdiaz02 closed 10 months ago
I just use option 1. You could try to make something that looks at the 'call' object returned by glmnet and parse out the formula, but that does not seem easy to do and likely cause other problems.
Thanks a lot for the advice! Closing this.
(Not a bug, but a question)
I am using a set of models that include, among others, ranger, xgboost, earth, and glmnet, and there are subject-matter reasons to include interactions. ranger/random forest account for interactions; xgboost implicitly accounts for interactions with trees of depth >= 2; earth does too if we use degree >= 2.
For glmnet I've simply copied the original SL.glmnet to, say,
SL.glmnet3
, and where it hadI write, say, for 3-way (and 2-way) interactions:
This works fine with CV.SuperLearner. However, it does not work properly when I train on a data set and then predict on a different one because of the behavior in
predict.SL.glmnet
(easily seen in the code itself and explained in https://github.com/ecpolley/SuperLearner/pull/65). The result is that columns like, say, X1:X2 do not contain X1:X2 but 0.Passing to the training an X that contains all the interactions is not the way to go (I do not want to do that to ranger, xboost, or earth, for example). I can quickly think of two ways of trying to work around this issue:
SL.glmnet2
,SL.glmnet2
return objects of classSL.glmnet2
, .... Then, create a bunch ofpredict.SL.glmnet2
,predict.SL.glmnet3
, ..., and in these functions, where it sayswrite
(or 3, or whatever, as appropriate).
predict.SL.glmnet
and expand as appropriate.Option 1. is an ugly kludge, but is easy to do. Option 2. seems much more elegant, but I think there are multiple places where I can make mistakes, some of which might I might not event see or anticipate.
Has anyone dealt with this before? Any comments? Thanks in advance.