csh01470 / catsnip

Treesnip-based Catboost Wrapper
GNU General Public License v3.0
1 stars 0 forks source link

Parameter rsm is only supported on CPU #4

Closed icejean closed 1 year ago

icejean commented 1 year ago

Should set rsm to NULL when task_type = 'GPU', referenced thread: Error: rsm on GPU is supported for pairwise modes only #983 On my machine with a GPU:

> cat_fit_bo<- cat_wflow_bo %>% fit(higgs_train)
Custom logger is already specified. Specify more than one logger at same time is not thread safe.Error in catboost::catboost.train(learn_pool = d, params = list(iterations = 986,  : 
  C:/Program Files (x86)/Go Agent/pipelines/BuildMaster/catboost.git/catboost/private/libs/options/catboost_options.cpp:626: Error: rsm on GPU is supported for pairwise modes only
In addition: Warning message:
The following arguments cannot be manually modified and were removed: rsm, subsample. 

Data set, source:

library(tidymodels)
library(kableExtra)
library(tidyr)
# remotes::install_github("Glemhel/treesnip", INSTALL_opts = c("--no-multiarch"))
# library(treesnip)
# devtools::install_github(repo="csh01470/catsnip")
library(catsnip)
library(data.table)
# https://github.com/tidymodels/tune/issues/157 
# https://curso-r.github.io/treesnip/articles/parallel-processing.html
library(doParallel)
# cl <- makePSOCKcluster(parallel::detectCores()) 
cl <- makePSOCKcluster(12)   # CPU fit()
#cl <- makePSOCKcluster(2)   # GPU fit() nthread is not effective, but tune_bayes() does.
registerDoParallel(cl)
foreach::getDoParWorkers()
# load packages for every worker.
clusterEvalQ(cl,
             {library(tidymodels)
               library(treesnip)
               library(catboost)
             })

tidymodels_prefer()

# ----------------------------------------------------------------------------------------
t1<-proc.time()
higgs<- fread("D:/temp/data/HIGGS/HIGGS.csv", header=FALSE, encoding="UTF-8")
higgs$V1<-as.factor(higgs$V1)
t2<-proc.time()
cat(t2-t1)
# 17.41 16.25 34.02 NA NA
names(higgs)

t1<-proc.time()
set.seed(2023)
higgs_split <- initial_split(higgs, prop = 0.90)
higgs_train <- training(higgs_split)
higgs_test  <-  testing(higgs_split)
t2<-proc.time()
cat(t2-t1)
# 6.63 0.55 7.2 NA N

# Compare performance between CPU & GPU ----------------------------------------------------
# https://curso-r.github.io/treesnip/articles/parallel-processing.html
higgs_rec<-
  recipe(V1 ~ ., data = higgs_train) %>%
  step_normalize(all_numeric_predictors())

cat_spec <-
  boost_tree(mtry=tune(), tree_depth = tune(), trees = tune(), learn_rate = tune(), min_n = tune()) %>%
  set_engine('catboost', subsample = tune("subsample"), rsm=NULL, task_type = 'GPU', nthread = 2) %>%  # GPU
  # set_engine('catboost', subsample = tune("subsample"), task_type = 'CPU', nthread = 12) %>%  # CPU
  set_mode('classification')

cat_wflow <- 
  workflow() %>% 
  add_model(cat_spec) %>% 
  add_recipe(higgs_rec)

# best parameter
cat_param_best<-
  tibble(
    mtry = 6,
    trees = 986,
    min_n = 24,
    tree_depth = 14,
    learn_rate = 0.117 ,
    subsample =  0.558
  )

cat_wflow_bo <-
  cat_wflow %>%
  finalize_workflow(cat_param_best)

t1<-proc.time()
cat_fit_bo<- cat_wflow_bo %>% fit(higgs_train)
t2<-proc.time()

cat(t2-t1)
#GPU 650.52 183.77 511.34 NA NA
#GPU 657.82 188.7 509.61 NA NA
#CPU 65252.86 2728.51 6305.28 NA NA

t1<-proc.time()
higgs_test_bo <- predict(cat_fit_bo, new_data = higgs_test %>% select(-V1), type = "prob")
t2<-proc.time()
cat(t2-t1)
#GPU 36.42 0.07 2.69 NA NA
#CPU 40.11 0.9 3.35 NA NA

higgs_test_bo <- bind_cols(higgs_test_bo, higgs_test %>% select(V1))
roc_auc(
  higgs_test_bo,
  truth = V1,
  estimate=.pred_0,
  options = list(smooth = TRUE)
)
#GPU 85.4
#CPU 85.4

And there's a problem for tuning parameter subsample, in treesnip it should be sample_size, but it's shown as sample_prop, so I use engine parameter subsample for tuning, it works although there's a warning, maybe should be fixed:

The following arguments cannot be manually modified and were removed: rsm, subsample.
> show_model_info("boost_tree")
Information for `boost_tree`
 modes: unknown, classification, regression, censored regression 

 engines: 
   classification: C5.01, catboost, spark, xgboost1
   regression:     catboost, spark, xgboost1

1The model can use case weights.

 arguments: 
   xgboost:  
      tree_depth     --> max_depth
      trees          --> nrounds
      learn_rate     --> eta
      mtry           --> colsample_bynode
      min_n          --> min_child_weight
      loss_reduction --> gamma
      sample_size    --> subsample
      stop_iter      --> early_stop
   C5.0:     
      trees          --> trials
      min_n          --> minCases
      sample_size    --> sample
   spark:    
      tree_depth     --> max_depth
      trees          --> max_iter
      learn_rate     --> step_size
      mtry           --> feature_subset_strategy
      min_n          --> min_instances_per_node
      loss_reduction --> min_info_gain
      sample_size    --> subsampling_rate
   catboost: 
      tree_depth     --> depth
      trees          --> iterations
      learn_rate     --> learning_rate
      mtry           --> rsm
      min_n          --> min_data_in_leaf
      sample_prop    --> subsample

 fit modules:
     engine           mode
    xgboost     regression
    xgboost classification
       C5.0 classification
      spark     regression
      spark classification
   catboost     regression
   catboost classification

 prediction modules:
             mode   engine          methods
   classification     C5.0 class, prob, raw
   classification catboost class, prob, raw
   classification    spark      class, prob
   classification  xgboost class, prob, raw
       regression catboost     numeric, raw
       regression    spark          numeric
       regression  xgboost     numeric, raw
csh01470 commented 1 year ago

Hi.

In order to implement the function as you said, we processed the rsm=NULL when task_type="GPU". But, I can't check the GPU usage of catsnip in my mac m1. can you check if it works properly in windows?

icejean commented 1 year ago

Hi, I think there's something wrong with the new codes that I can't see any workload on GPUs, it just runs on CPU: Catboost-higgs-GPU-test

And treesnip will choose Nvidia GPU automaticlly, not the integrated Intel GPU. Catboost-higgs-GPU-test2

csh01470 commented 1 year ago

Hi. 😊 I'm sorry for the late reply. I just modified thetrain_catboost.R code. can u check if it works properly?

icejean commented 1 year ago

Hi, welldone! It seems to be O.K. this time, although it seems to be slow( not knowing why yet), thanks for your effort! CatBoost-GPU-1

csh01470 commented 1 year ago

That's good to see that the GPU is actually working, but it's a little bit slow, so I'll check on that.

And in the catsnip package, I've implemented code so that the mtry actually behaves like mtry_prop. What problems did u encounter along the way?

icejean commented 1 year ago

What's the meaning of parameter mtry_prop? I can't get it. mtry refers to rsm in CatBoost, but this parameter only works for CPU.

BTW: The tuned parameter name should be sample_size, not sample_prop, just as the name of the same parameter in XGBoost and LightGBM.

library(tidymodels)
library(kableExtra)
library(tidyr)
# devtools::install_github(repo="csh01470/catsnip", INSTALL_opts = c("--no-multiarch"))
library(catsnip)
library(bonsai)
tidymodels_prefer()
show_model_info("boost_tree")
> show_model_info("boost_tree")
Information for `boost_tree`
 modes: unknown, classification, regression, censored regression 

 engines: 
   classification: C5.01, catboost, lightgbm, spark, xgboost1
   regression:     catboost, lightgbm, spark, xgboost1

1The model can use case weights.

 arguments: 
   xgboost:  
      tree_depth     --> max_depth
      trees          --> nrounds
      learn_rate     --> eta
      mtry           --> colsample_bynode
      min_n          --> min_child_weight
      loss_reduction --> gamma
      sample_size    --> subsample
      stop_iter      --> early_stop
   C5.0:     
      trees          --> trials
      min_n          --> minCases
      sample_size    --> sample
   spark:    
      tree_depth     --> max_depth
      trees          --> max_iter
      learn_rate     --> step_size
      mtry           --> feature_subset_strategy
      min_n          --> min_instances_per_node
      loss_reduction --> min_info_gain
      sample_size    --> subsampling_rate
   catboost: 
      tree_depth     --> depth
      trees          --> iterations
      learn_rate     --> learning_rate
      mtry           --> rsm
      min_n          --> min_data_in_leaf
      sample_prop    --> subsample
   lightgbm: 
      tree_depth     --> max_depth
      trees          --> num_iterations
      learn_rate     --> learning_rate
      mtry           --> feature_fraction_bynode
      min_n          --> min_data_in_leaf
      loss_reduction --> min_gain_to_split
      sample_size    --> bagging_fraction
      stop_iter      --> early_stopping_round

 fit modules:
     engine           mode
    xgboost     regression
    xgboost classification
       C5.0 classification
      spark     regression
      spark classification
   catboost     regression
   catboost classification
   lightgbm     regression
   lightgbm classification

 prediction modules:
             mode   engine          methods
   classification     C5.0 class, prob, raw
   classification catboost class, prob, raw
   classification lightgbm class, prob, raw
   classification    spark      class, prob
   classification  xgboost class, prob, raw
       regression catboost     numeric, raw
       regression lightgbm          numeric
       regression    spark          numeric
       regression  xgboost     numeric, raw
icejean commented 1 year ago

GBDT algorithms, such as XGBoost, LightGBM and CatBoost, are both much slower on GPU than CPU with multi-cores, on little data sets, but if the data set is big enough, such as the HIGGS data set, and the iteration times is bigger enough, than GPU will run much more faster than CPU. The performance line charts below show the difference on GPU and CPU with data set HIGGS, x is iteration times and y is seconds. CatBoost Performance CPU vs GPU on HIGGS: Catboost-perf XGBoost Performance CPU vs GPU on HIGGS: XGBoost-higgs-GPU-2 LIghtGBM Performance CPU vs GPU on HIGGS: lightbgm-perf

csh01470 commented 1 year ago

In the catsnip package, the mtry parameter receives the number of columns (an integer value), but in practice, when passing from the mtry parameter to the rsm parameter, it is set to calculate the ratio of the number of columns to the total number of columns.

ex) The total number of columns in a dataset is 10 and mtry parameter is 3, and rsm parameter is entered as 0.3.

csh01470 commented 1 year ago

And I set sample_size argument in the boost_tree() to match the subsample argument in catboost 😊

icejean commented 1 year ago

Maybe I can have a test on Friday, I'll reply the result then.

icejean commented 1 year ago

The new version is O.K., welldone!

library(tidymodels)
library(kableExtra)
library(tidyr)
# remotes::install_github("Glemhel/treesnip", INSTALL_opts = c("--no-multiarch"))
# library(treesnip)
# devtools::install_github(repo="csh01470/catsnip", INSTALL_opts = c("--no-multiarch"))
library(catsnip)
library(data.table)
# https://github.com/tidymodels/tune/issues/157 
# https://curso-r.github.io/treesnip/articles/parallel-processing.html
library(doParallel)
# cl <- makePSOCKcluster(parallel::detectCores()) 
cl <- makePSOCKcluster(12)   # CPU fit()
#cl <- makePSOCKcluster(2)   # GPU fit() nthread is not effective, but tune_bayes() does.
registerDoParallel(cl)
foreach::getDoParWorkers()
# load packages for every worker.
clusterEvalQ(cl,
             {library(tidymodels)
               library(treesnip)
               library(catboost)
             })

tidymodels_prefer()
show_model_info("boost_tree")

# ----------------------------------------------------------------------------------------
t1<-proc.time()
higgs<- fread("D:/temp/data/HIGGS/HIGGS.csv", header=FALSE, encoding="UTF-8")
higgs$V1<-as.factor(higgs$V1)
t2<-proc.time()
cat(t2-t1)
# 17.41 16.25 34.02 NA NA
names(higgs)

t1<-proc.time()
set.seed(2023)
higgs_split <- initial_split(higgs, prop = 0.90)
higgs_train <- training(higgs_split)
higgs_test  <-  testing(higgs_split)
t2<-proc.time()
cat(t2-t1)
# 6.63 0.55 7.2 NA N

# Compare performance between CPU & GPU ----------------------------------------------------
# https://curso-r.github.io/treesnip/articles/parallel-processing.html
higgs_rec<-
  recipe(V1 ~ ., data = higgs_train) %>%
  step_normalize(all_numeric_predictors())

cat_spec <-
  boost_tree(mtry=tune(), tree_depth = tune(), trees = tune(), 
             learn_rate = tune(), min_n = tune(), sample_size=tune()) %>%
  set_engine('catboost', rsm=NULL, task_type = 'GPU', nthread = 2) %>%  # GPU
  # set_engine('catboost', task_type = 'CPU', nthread = 12) %>%  # CPU
  set_mode('classification')

cat_wflow <- 
  workflow() %>% 
  add_model(cat_spec) %>% 
  add_recipe(higgs_rec)

# best parameter
cat_param_best<-
  tibble(
    mtry = 6,
    trees = 986,
    min_n = 24,
    tree_depth = 14,
    learn_rate = 0.117 ,
    sample_size =  0.558
  )

cat_wflow_bo <-
  cat_wflow %>%
  finalize_workflow(cat_param_best)

t1<-proc.time()
cat_fit_bo<- cat_wflow_bo %>% fit(higgs_train)
t2<-proc.time()

cat(t2-t1)
#GPU 650.52 183.77 511.34 NA NA
#GPU 657.82 188.7 509.61 NA NA
#CPU 65252.86 2728.51 6305.28 NA NA

t1<-proc.time()
higgs_test_bo <- predict(cat_fit_bo, new_data = higgs_test %>% select(-V1), type = "prob")
t2<-proc.time()
cat(t2-t1)
#GPU 36.42 0.07 2.69 NA NA
#CPU 40.11 0.9 3.35 NA NA

higgs_test_bo <- bind_cols(higgs_test_bo, higgs_test %>% select(V1))
roc_auc(
  higgs_test_bo,
  truth = V1,
  estimate=.pred_0,
  options = list(smooth = TRUE)
)
#GPU 85.4
#CPU 85.4
> show_model_info("boost_tree")
Information for `boost_tree`
 modes: unknown, classification, regression, censored regression 

 engines: 
   classification: C5.01, catboost, spark, xgboost1
   regression:     catboost, spark, xgboost1

1The model can use case weights.

 arguments: 
   xgboost:  
      tree_depth     --> max_depth
      trees          --> nrounds
      learn_rate     --> eta
      mtry           --> colsample_bynode
      min_n          --> min_child_weight
      loss_reduction --> gamma
      sample_size    --> subsample
      stop_iter      --> early_stop
   C5.0:     
      trees          --> trials
      min_n          --> minCases
      sample_size    --> sample
   spark:    
      tree_depth     --> max_depth
      trees          --> max_iter
      learn_rate     --> step_size
      mtry           --> feature_subset_strategy
      min_n          --> min_instances_per_node
      loss_reduction --> min_info_gain
      sample_size    --> subsampling_rate
   catboost: 
      tree_depth     --> depth
      trees          --> iterations
      learn_rate     --> learning_rate
      mtry           --> rsm
      min_n          --> min_data_in_leaf
      sample_size    --> subsample
      stop_iter      --> early_stopping_rounds

 fit modules:
     engine           mode
    xgboost     regression
    xgboost classification
       C5.0 classification
      spark     regression
      spark classification
   catboost     regression
   catboost classification

 prediction modules:
             mode   engine          methods
   classification     C5.0 class, prob, raw
   classification catboost class, prob, raw
   classification    spark      class, prob
   classification  xgboost class, prob, raw
       regression catboost     numeric, raw
       regression    spark          numeric
       regression  xgboost     numeric, raw

BTW: I get the message that many people explain model's SHAP value with the DALE & DALEXtra package, there's also a book named Explanatory Model Analysis--- Explore, Explain, and Examine Predictive Models. With examples in R and Python.

csh01470 commented 1 year ago

Glad it worked for you ( ◡̉̈ ) I'll look into DALEXtra package. Have a nice day!