Specifying intercept models is not documented & a PITA

boost-R / gamboostLSS

Boosting models for fitting generalized additive models for location, shape and scale (GAMLSS) to potentially high dimensional data. The current relase version can be found on CRAN (https://cran.r-project.org/package=gamboostLSS).

26 stars 11 forks source link

Specifying intercept models is not documented & a PITA #13

Closed fabian-s closed 8 years ago

fabian-s commented 8 years ago

I think having to do

data$ones <- rep(1, n)
gamboostLSS(list( .... ~ ...,  bla = response ~ bols(ones, intercept =FALSE)), 
  data=data, families=SomethingCraycray())

instead of

 bla = response ~ 1

sucks major d* in terms of usability. At least it should be documented** somewhere that this is the way we want users to specify an intercept model / formula.....

ja-thomas commented 8 years ago

This is already a problem of mboost not just gamboostLSS

library(gamboostLSS)

data(cars)
gamboost(dist ~ 1, data = cars, dfbase = 4)

gamboostLSS(dist ~ 1, data = cars)

We should put it in the description (ideally in gamboost and gamboostLSS) and check if there is an easy fix for that in mboost, but we have to fix the problem there.

fabian-s commented 8 years ago

not sure I agree.

that issue never comes up for (non-pathological specifications of) mboost models because why would ever want to boost an intercept model? However, intercept models do make sense for GAMLSS-type models because you may want to restrict the flexibility of additive predictors for higher order moments / nuisance parameters (to cut computation times, remain interpretable, etc).

hofnerb commented 8 years ago

Well, the point probably is that it usually doesn't make sense in mboost but it should be implemented there anyway as all the interfaces for model fitting are provided from mboost. Perhaps one should try to interpret 1 generally as intercept. Thus instead of

cars$int <- 1
gamboost(dist ~ bols(int, intercept = FALSE) + bols(..., intercept = FALSE), data = cars)

one could then write

gamboost(dist ~ bols(1, intercept = FALSE) + bols(..., intercept = FALSE), data = cars)
## or even better
gamboost(dist ~ 1 + bols(..., intercept = FALSE), data = cars)

1 should then always be defined as bols(rep(1, nrow(data), intercept = FALSE).

fabian-s commented 8 years ago

Do we agree that there is no realistic use case for a pure intercept base learner in mboost, but that there is one for additive predictors in gamboostLSS?

If so, I think it's a user interface / formula parsing issue for gamboostLSS (i.e., a ~1 formula should just add the missing columns ones to the data and treat ~ 1 as ~ bols(ones, intercept = FALSE)), not a missing feature in mboost.

If not, when/why would I ever want to specify a naked intercept in mboost and, if we make it easy to do so, how would we preempt user error & misunderstandings about the fact that every base learner updates its own intercept by default anyways?

fabian-s commented 8 years ago

Just to be clear:

I don't think we need to / should enable formulas ~ 1 + bols(bla) + bbbs(blub).

I do think having a shorthand for "this parameter is not affected by any covariates" via nuisance_param = response ~1 would be useful, as the default of recycling the first formula for all parameters of the distribution means that models get insanely complicated very quickly and specifying simplifications is a huge pain ATM (and not documented anywhere!)

fabian-s commented 8 years ago

@hofnerb @ja-thomas just re-read your comments, you're right of course, I was being a Gscheithaferl :smirk: Closing this and migrating it to mboost.