ecpolley / SuperLearner

Current version of the SuperLearner R package
272 stars 72 forks source link

add a external algorithm #119

Closed caprone closed 5 years ago

caprone commented 5 years ago

HI Is it possible to add an external algorithm, like catboost, to superlearner?

ecpolley commented 5 years ago

You can always use your own algorithms when you run SuperLearner, or are you asking about adding additional algorithms to the SuperLearner package? We have two options here, there is the SuperLearnerExtra Github repository (https://github.com/ecpolley/SuperLearnerExtra) where you can add scripts for additional algorithms. This is the easiest place to share additional algorithms. The second option is a pull request with the additional algorithm and changes to help documents and NAMESPACE and Description files.

caprone commented 5 years ago

HI @ecpolley ! ok, perfect; yes I want use catboost algorithm in SuperLearner, and I'll follow implementations examples shown in SuperLearnerExtra ;) Great!

thanks again!

caprone commented 5 years ago

HI @ecpolley

sorry, but how I can add my custom learner to SL.Library? because when I declare it in current enviroment namespace, SuperLearner function drop it from library, then it doesn't work.

thanks!

ecpolley commented 5 years ago

Hi @caprone , do you have an example? Are you using the SuperLearner function or snowSuperLearner? If you create your won algorithm you should make sure it doesn't conflict with the name of an included learner. With the SuperLearner function you should be able to simply add the name of the algorithm to the SL.library list.

caprone commented 5 years ago

HI @ecpolley , yes using snowsuperLearner, but similiar issue occurs also with SupeLearner: this is my probabaly wrong implementation of very basic catboost learner to try for SL.library:

SL.catboost <- function (Y, X, newX, family)
  {

  require("catboost")

  params <- list(iterations=50,
                 learning_rate=0.1,
                 # thread_count = 10)
  train_pool = catboost.load_pool(data=X, label=Y)

  if (family$family == "gaussian") {             
    fit <- catboost.train(train_pool, params=params)
  }
  if (family$family == "binomial") {
    fit <- catboost.train(train_pool, params=params)
  }

  test_pool = catboost.load_pool(data=newX, label=NULL)
  preds <- catboost.predict(fit, test_pool)
  fit <- list(object = fit)
  class(fit) <- c("SL.catboost")
  out <- list(pred = preds, fit = fit)
  return(out)
}

predict.SL.catboost <- function (object, newdata, family, ...)
{
  require("catboost")
  if (family$family == "gaussian") {
    pred <- predict(object$object, data = catboost.load_pool(newdata), verbose = FALSE)
  }
  if (family$family == "binomial") {
    pred <- predict(object$object, data = catboost.load_pool(newdata), verbose = FALSE)
  }
  pred
}
ecpolley commented 5 years ago

Try adding ... to the arguments for the wrapper:

SL.catboost <- function (Y, X, newX, family, ...)
  {

  require("catboost")

  if (family$family == "gaussian") {
    train_pool = catboost.load_pool(data=X, label=Y)
params <- list(iterations=50,
               learning_rate=0.1)

fit <- catboost.train(pool, params=params)
  }

  if (family$family == "binomial") {
    fit <- catboost.train(pool, params)
  }

  test_pool = catboost.load_pool(data=newX, label=NULL)
  preds <- catboost.predict(fit, data = test_pool)
  fit <- list(object = fit)
  class(fit) <- c("SL.catboost")
  out <- list(pred = preds, fit = fit)
  return(out)
}
caprone commented 5 years ago

ooohh , sorry for my stupid mistake..! perfect thanks!!

Now works great also in parallel!!

Thanks again for your smart package ;)

ps. I edited code for some logic implementation errors...

ecpolley commented 5 years ago

Great, glad it works!

pat512-star commented 1 year ago

Thanks for this code. However, we are struggling to get SuperLearner to call catboost using this wrapper. We keep getting the error message from R: "Error in catboost.train(train_pool, params) : Expected catboost.Pool, got: list In addition: Warning message: In FUN(X[[i]], ...) : Error in algorithm SL.catboost The Algorithm will be removed from the Super Learner (i.e. given weight 0)

Have you any advice about this? It would be very gratefuly received!

ecpolley commented 1 year ago

Here is an updated version (might want to adjust some of the parameters, but this is something you can start with:

SL.catboost <- function (Y, X, newX, family, ...)
   {

   require("catboost")

   if (family$family == "gaussian") {
     train_pool = catboost.load_pool(data=X, label=Y)
 params <- list(iterations=50,
                learning_rate=0.1)

 fit <- catboost.train(train_pool, params=params)
   }

   if (family$family == "binomial") { 
    train_pool = catboost.load_pool(data=X, label=Y)
     params <- list(iterations=50,
                learning_rate=0.1)
     fit <- catboost.train(train_pool, params)
   }

   test_pool = catboost.load_pool(data=newX, label=NULL)
   preds <- catboost.predict(fit, pool = test_pool)
   fit <- list(object = fit)
   class(fit) <- c("SL.catboost")
   out <- list(pred = preds, fit = fit)
   return(out)
 }
pat512-star commented 1 year ago

Thanks so much Eric!

That's fantastic!

I will try it out today

Thanks so much for your help