ailinweili / FDboost

Boosting Functional Regression Models. The current release version can be found on CRAN (http://cran.r-project.org/package=FDboost).
0 stars 0 forks source link

check whether Multinomial works for bfpco/FDboost #2

Open fabian-s opened 7 years ago

fabian-s commented 7 years ago

... so we can do the Phoneme dataset for benchmark

fabian-s commented 7 years ago

(how) does it work? can you post some code below?

ailinweili commented 7 years ago

Hi, I did not realize I closed this issue. I think it was by accident.......sorry

ailinweili commented 7 years ago

Hi, I wrote an example on how to use bfpco base learner for multinomial distibuted FDboost. Actually I could not find many materials about how to correctly use "%O%" term for multinomial family. So I tried the bols( ) for dummy response variable, and it succeeded. The prediction accuracy is not bad.

# exmples on multinumial regression utilizing fuelsubset dataset 
library(FDboost)
data("fuelSubset")

## modelling data
myfuel <- fuelSubset
myfuel$heatanclass <- cut(fuelSubset$heatan, breaks = c(5,15,25,35), labels = c("a","b","c"))
### define a dummy vector with one factor level less than the outcome,
### which is used as reference category.
myfuel$heatandummy <- factor(levels(myfuel$heatanclass)[-nlevels(myfuel$heatanclass)])

## fit a multinomial FDboost model
mlm1 <-  FDboost(heatanclass ~ 
                   bfpco(UVVIS, s = uvvis.lambda, df = 4) %O% 
                      bols(heatandummy,df = 4, contrasts.arg = "contr.dummy") + 
                   bfpco(NIR, s = nir.lambda, df = 4) %O% 
                      bols(heatandummy, df = 4, contrasts.arg = "contr.dummy") , 
                 timeformula = ~bols(1), data = myfuel, family = Multinomial(), 
                 control = boost_control(mstop = 200))

## model performance
### contingency table
tab1 <- table(data = myfuel$heatanclass, fitted = predict(mlm1, type = "class"))
print(tab1)
### compute prediction accurracy
print(sum(diag(tab1))/sum(tab1))

## prediction on newdata
### prepare new data
set.seed(201)
index <- sample(1:length(myfuel$heatan), size = 50)
newdata <- list()
newdata$NIR <- myfuel$NIR[index, ]
newdata$UVVIS <- myfuel$UVVIS[index, ]
newdata$nir.lambda <- myfuel$nir.lambda
newdata$uvvis.lambda <- myfuel$uvvis.lambda
newdata$heatandummy <- myfuel$heatandummy

### prediction effect
tab2 <- table(myfuel$heatanclass[index], predict(mlm1, newdata = newdata, type = "class"))
print(tab2)
print(sum(diag(tab2))/sum(tab2))
fabian-s commented 7 years ago

Hi,

After updating to the latest version from Github, I get:

mlm1 <-  FDboost(heatanclass ~ 
     bfpco(UVVIS, s = uvvis.lambda, df = 4) %O% bols(heatandummy, df = 4, contrasts.arg = "contr.dummy") + 
     bfpco(NIR, s = nir.lambda, df = 4) %O% bols(heatandummy, df = 4, contrasts.arg = "contr.dummy") , 
   timeformula = ~bols(1), data = myfuel, family = Multinomial(), 
   control = boost_control(mstop = 200))

# Error in dist(Y.tilde, method = distType, ...) : invalid distance method

traceback()
#13: stop("invalid distance method")
#12: dist(Y.tilde, method = distType, ...) at baselearners.R#2168
#11: (function (Y = NULL, Y.pred = NULL, center = FALSE, random.int = FALSE, 
#       nbasis = 10, argvals = NULL, distType = NULL, npc = NULL, 
#        npc.max = NULL, pve = 0.99, ...) 
#   {
#        if (is.null(Y.pred)) 
#     ...
#10: do.call(fpco.sc, decomppars) at baselearners.R#1884
#9: X_fpco(mf, vary, args = hyper_fpco(mf, vary, df = df, lambda = lambda, 
#      pve = pve, npc = npc, npc.max = npc.max, s = s, distType = distType, 
#       ...)) at baselearners.R#2419
#8: bfpco(UVVIS, s = uvvis.lambda, df = 4)
#7: bfpco(UVVIS, s = uvvis.lambda, df = 4) %O% bols(heatandummy, 
#      df = 4, contrasts.arg = "contr.dummy")
#6: inherits(a, "blg")
#5: bfpco(UVVIS, s = uvvis.lambda, df = 4) %O% bols(heatandummy, 
#      df = 4, contrasts.arg = "contr.dummy") + bfpco(NIR, s = nir.lambda, 
#       df = 4) %O% bols(heatandummy, df = 4, contrasts.arg = "contr.dummy")
#4: eval(expr, envir, enclos)
#3: eval(as.expression(formula[[3]]), envir = c(as.list(data), list(`+` = get("+"))), 
#       enclos = environment(formula))
#2: mboost(fm, data = data, weights = w, offset = offset, ...) at FDboost.R#1115
#1: FDboost(heatanclass ~ bfpco(UVVIS, s = uvvis.lambda, df = 4) %O% 
#       bols(heatandummy, df = 4, contrasts.arg = "contr.dummy") + 
#       bfpco(NIR, s = nir.lambda, df = 4) %O% bols(heatandummy, 
#           df = 4, contrasts.arg = "contr.dummy"), timeformula = ~bols(1), 
#       data = myfuel, family = Multinomial(), control = boost_control(mstop = 200))

If I do

mlm1 <-  FDboost(heatanclass ~ 
    bfpco(UVVIS, s = uvvis.lambda, df = 4, distType = "DTW") %O% bols(heatandummy, df = 4, contrasts.arg = "contr.dummy") + 
    bfpco(NIR, s = nir.lambda, df = 4, distType = "DTW") %O% bols(heatandummy, df = 4, contrasts.arg = "contr.dummy") , 
  timeformula = ~bols(1), data = myfuel, family = Multinomial(), 
  control = boost_control(mstop = 200))

instead, it works. Seems like bfpco doesn't hand over the default argument for distType correctly if it's not given explicitly.... ?

ailinweili commented 7 years ago

Hi, Fabian. This is mainly because library(dtw) is not called at the very beginning.

fabian-s commented 7 years ago

I see... Still, that's a bug -- make sure that you import all the functions from other packages that you use in your code. Similar with slanczos from mgcv (and probably more), e.g. devtools::check() gives me:

* checking R code for possible problems ... NOTE
X_fpco: no visible global function definition for ‘dist’
cmdscale_lanczos_new: no visible global function definition for
  ‘slanczos’
fpco.sc: no visible global function definition for ‘gamm4’
fpco.sc: no visible global function definition for ‘dist’
Undefined global functions or variables:
  dist gamm4 slanczos
Consider adding
  importFrom("stats", "dist")

... but I think you'd probably want importFrom("proxy", "dist"). Also, please do so by using roxygen2, do not edit NAMESPACE manually.

ailinweili commented 7 years ago

Yeah, I have updated the documentation