csh01470 / catsnip

Treesnip-based Catboost Wrapper
GNU General Public License v3.0
1 stars 0 forks source link

Error when using for classification #3

Closed icejean closed 1 year ago

icejean commented 1 year ago

Error message: Error in catboost.from_matrix(as.matrix(float_and_cat_features_data), : Unsupported label type, expecting double or integer, got character Please read the issue and solution in the following link: https://github.com/catboost/catboost/issues/1874

Mikhail Rudakov has a workaround mentioned in the above link.

>remotes::install_github("Glemhel/treesnip")

But his version would conflict with bonsai, when need to use catboost together with lightgbm, catsnip is the better one, so would you please fix it with catsnip?

Best Regards Jean

csh01470 commented 1 year ago

Hi. thank you for your interest in the catsnip package.

Currently, I am checking the catboost.R in the forked treesnip package. It may take some time to update code because of test

And can you show me the code where the error message occurred while using the catsnip package? with output of sessionInfo()

icejean commented 1 year ago

Well, thanks for your reply. :)

library(tidymodels)
library(kableExtra)
library(tidyr)
# This version of treesnip is O.K. for classification, but conflicts with bonsai.
# remotes::install_github("Glemhel/treesnip", INSTALL_opts = c("--no-multiarch"))
# library(treesnip)
# devtools::install_github(repo="csh01470/catsnip", INSTALL_opts = c("--no-multiarch"))
library(catsnip)
tidymodels_prefer()
show_model_info("boost_tree")

countries = c('RUS','USA','SUI')
years = c(1900,1896,1896)
phone_codes = c(7,1,41)
domains = c('ru','us','ch')
label_values = c(0,1,1)

dataset = data.frame(label_values, countries, years, phone_codes, domains)
dataset$countries<-as.factor(dataset$countries)
dataset$domains<-as.factor(dataset$domains)
dataset$label_values<-as.factor(dataset$label_values)

cat_rec <- recipe(label_values ~ countries+years+phone_codes+domains, data = dataset) 
cat_model <-
  boost_tree(trees = 100, tree_depth= 5, learn_rate= 0.03) %>%
  set_engine('catboost', loss_function = 'Logloss', ignored_features = c(4,9),
             border_count = 32, l2_leaf_reg = 3.5) %>%
  set_mode('classification')

translate(cat_model)

cat_wflow <- 
  workflow() %>% 
  add_model(cat_model) %>% 
  add_recipe(cat_rec)

cat_fit <- fit(cat_wflow, dataset)
res <- predict(cat_fit, new_data = dataset %>% select(-label_values))
> cat_fit <- fit(cat_wflow, dataset)
Parameter 'cat_features' is meaningless because column types are taken from data.frame.
Please, convert categorical columns to factors manually.
Error in catboost.from_matrix(as.matrix(float_and_cat_features_data),  : 
  Unsupported label type, expecting double or int, got: character
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] catsnip_0.0.3      kableExtra_1.3.4   yardstick_1.1.0    workflowsets_1.0.0 workflows_1.1.2   
 [6] tune_1.0.1.9001    tidyr_1.2.1        tibble_3.1.8       rsample_1.1.1      recipes_1.0.3.9000
[11] purrr_1.0.0        parsnip_1.0.3      modeldata_1.0.1    infer_1.0.4        ggplot2_3.4.0     
[16] dplyr_1.0.10       dials_1.1.0        scales_1.2.1       broom_1.0.1        tidymodels_1.0.0  

loaded via a namespace (and not attached):
 [1] colorspace_2.0-2    ellipsis_0.3.2      class_7.3-19        rprojroot_2.0.3     fs_1.5.0           
 [6] rstudioapi_0.14     listenv_0.9.0       furrr_0.3.1         remotes_2.4.0       prodlim_2019.11.13 
[11] fansi_1.0.3         lubridate_1.9.0     xml2_1.3.3          codetools_0.2-18    splines_4.1.0      
[16] cachem_1.0.5        knitr_1.41          pkgload_1.2.1       jsonlite_1.8.4      compiler_4.1.0     
[21] httr_1.4.4          backports_1.4.1     assertthat_0.2.1    Matrix_1.5-3        fastmap_1.1.0      
[26] cli_3.5.0           htmltools_0.5.2     prettyunits_1.1.1   tools_4.1.0         gtable_0.3.1       
[31] glue_1.6.2          Rcpp_1.0.9          DiceDesign_1.9      vctrs_0.5.1         svglite_2.1.0      
[36] iterators_1.0.14    conflicted_1.1.0    timeDate_4021.107   gower_1.0.1         xfun_0.35          
[41] stringr_1.5.0       globals_0.16.2      ps_1.6.0            testthat_3.0.3      rvest_1.0.3        
[46] timechange_0.1.1    lifecycle_1.0.3     devtools_2.4.2      future_1.30.0       MASS_7.3-54        
[51] ipred_0.9-13        parallel_4.1.0      yaml_2.3.5          memoise_2.0.1       rpart_4.1.19       
[56] stringi_1.6.1       desc_1.3.0          foreach_1.5.2       lhs_1.1.5           hardhat_1.2.0      
[61] pkgbuild_1.2.0      lava_1.7.0          rlang_1.0.6         pkgconfig_2.0.3     systemfonts_1.0.4  
[66] evaluate_0.18       lattice_0.20-44     tidyselect_1.2.0    processx_3.8.0      parallelly_1.33.0  
[71] magrittr_2.0.3      bookdown_0.30       R6_2.5.1            generics_0.1.3      DBI_1.1.3          
[76] pillar_1.8.1        withr_2.5.0         survival_3.4-0      nnet_7.3-18         future.apply_1.10.0
[81] crayon_1.5.2        catboost_1.1.1      utf8_1.2.2          rmarkdown_2.18      usethis_2.1.6.9000 
[86] grid_4.1.0          callr_3.7.3         digest_0.6.31       webshot_0.5.4       GPfit_1.0-8        
[91] munsell_0.5.0       viridisLite_0.4.1   sessioninfo_1.2.2  
csh01470 commented 1 year ago

Hi. @icejean I fixed the this error. Would you like to reinstall catsnip package and try again?

icejean commented 1 year ago

Hi, I just reinstall catsnip with the following line:

devtools::install_github(repo="csh01470/catsnip", INSTALL_opts="--no-multiarch")

Here's the result, it runs with no error but the result is incorrect:

> cat_fit <- fit(cat_wflow, dataset)
> res <- predict(cat_fit, new_data = dataset %>% select(-label_values))
> res
# A tibble: 3 x 1
  .pred_class
  <fct>      
1 0          
2 0          
3 0    

Best regards.

csh01470 commented 1 year ago

Hi. I think it's a problem caused by modifying the rsm parameter. Do you try to reinstall?

icejean commented 1 year ago

Yes, I reinstall it with the following line:

devtools::install_github(repo="csh01470/catsnip", INSTALL_opts="--no-multiarch")

Maybe I should try it again?

icejean commented 1 year ago

O.K. on Linux this time, I'll try it again on Windows then, good job!

> cat_fit <- fit(cat_wflow, dataset)
> res <- predict(cat_fit, new_data = dataset %>% select(-label_values))
> res
# A tibble: 3 × 1
  .pred_class
  <fct>      
1 0          
2 1          
3 1          
> 
icejean commented 1 year ago

Great, O.K. on windows too!