ModelOriented / forester

Trees are all you need
https://modeloriented.github.io/forester/
GNU General Public License v3.0
113 stars 15 forks source link

forester: predicting house prices use case #105

Closed Saadi4469 closed 10 months ago

Saadi4469 commented 1 year ago

Hello and good day all,

First of all, thank you for the great tutorial, it's really helpful for newcomers like me who want to learn ML in R and thus I would also really appreciate more tutorials like these. Anyways, so, I started learning ML with this tutorial. And, I have a few follow up questions.

 no.                name        engine tuning     rmse         mse        r2      mae
1   4      lightgbm_model      lightgbm  basic 189129.3 35769880500 0.7846590 125470.3
2   1        ranger_model        ranger  basic 191753.7 36769493526 0.7786412 112641.0
3   2       xgboost_model       xgboost  basic 195941.1 38392931515 0.7688678 111193.2
4   3 decision_tree_model decision_tree  basic 197701.4 39085831161 0.7646964 134093.8
no.                name        engine        tuning     rmse          mse          r2      mae
1   85 decision_tree_bayes decision_tree     bayes_opt 178661.5  31919931491  0.80783638 125816.2
2   86      lightgbm_bayes      lightgbm     bayes_opt 180535.3  32593011269  0.80378432 119759.3
3   84       xgboost_bayes       xgboost     bayes_opt 182993.9  33486769826  0.79840373 104407.6
4   70       lightgbm_RS_8      lightgbm random_search 183125.3  33534861369  0.79811421 124956.9
5   73      lightgbm_RS_11      lightgbm random_search 183125.3  33534861369  0.79811421 124956.9

-  Same goes for plot(model_parts, max_vars = 5). As you can see the results are slightly different. Also, in your plot, how did you change the plot headings/captions under Feature Importance? Moreover, how come my plot's x-axis are in scientific notation? 

-  To get the report function to work I first had to install ggradar and tinytex. Then When I run report(output_2), I get this error, which I don't know how to fix?

processing file: report.Rmd |................... | 30% (unnamed-chunk-1) Quitting from lines 65-67 (report.Rmd) Error in predictor$predict(data = data, start_iteration = start_iteration, : Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.


Session info:

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit) 
Running under: Windows 10 x64 (build 22000) **** Windows 11 actually, idk why R picks up Win 11 as Win 10

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggradar_0.2     DALEX_2.4.3     lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.0    
 [7] purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.0    ggplot2_3.4.1   tidyverse_2.0.0
[13] forester_1.2.0  tinytex_0.44   

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0  xfun_0.37         inum_1.0-5        DiceKriging_1.6.0 splines_4.2.1    
 [6] lattice_0.20-45   colorspace_2.1-0  vctrs_0.5.2       generics_0.1.3    htmltools_0.5.4  
[11] yaml_2.3.7        utf8_1.2.3        survival_3.3-1    rlang_1.0.6       pillar_1.8.1     
[16] glue_1.6.2        withr_2.5.0       xgboost_1.7.3.1   lifecycle_1.0.3   munsell_0.5.0    
[21] gtable_0.3.1      mvtnorm_1.1-3     evaluate_0.20     knitr_1.42        fastmap_1.1.1    
[26] tzdb_0.3.0        fansi_1.0.4       Rcpp_1.0.10       scales_1.2.1      lightgbm_3.3.5   
[31] jsonlite_1.8.4    ranger_0.14.1     digest_0.6.31     hms_1.1.2         stringi_1.7.12   
[36] grid_4.2.1        cli_3.6.0         tools_4.2.1       magrittr_2.0.3    Formula_1.2-5    
[41] pkgconfig_2.0.3   partykit_1.2-18   ellipsis_0.3.2    libcoin_1.0-9     Matrix_1.5-3     
[46] data.table_1.14.8 timechange_0.2.0  rmarkdown_2.20    rstudioapi_0.14   R6_2.5.1         
[51] rpart_4.1.16      compiler_4.2.1   

![image](https://user-images.githubusercontent.com/55733842/224738870-ca377dde-0dc9-4518-ae3b-d97da8e7df43.png)
Saadi4469 commented 1 year ago

Update:

I am also getting an error with predictions = predict_new(output_2, data = x)

No imputation performed due to only one observation. If any values are missing, user has to handle them by himself. Error in [.data.frame(train, , i) : undefined columns selected

kozaka93 commented 1 year ago

Hello,

Thank you for your comment.

The reason for the different results may be due to the setting of a randomness seed. Because we do not assume one fixed, we may get different results when splitting the data or initialising the algorithms. In random search, we may also receive other combinations of hyperparameters, which may affect the results obtained.

Saadi4469 commented 1 year ago

@kozaka93, thank you for the response and what about the two errors mentioned in the question?

HubertR21 commented 1 year ago

We are currently working on the new version of the tutorial with better visualisations and fixed bugs. It should be available soon.

Saadi4469 commented 1 year ago

Awesome!

Saadi4469 commented 1 year ago

So I was following this new tutorial and I have a couple of questions.

image

Error:

"C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS report.knit.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandocf9861506bf9.tex --lua-filter "C:\Users\ed\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\ed\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --highlight-style tango --pdf-engine pdflatex --variable graphics --variable "geometry:margin=1in" 
A new version of TeX Live has been released. If you need to install or update any LaTeX packages, you have to upgrade TinyTeX with tinytex::reinstall_tinytex(repository = "illinois").
! Undefined control sequence.
<argument> O:psd 
                 ed ML Learninghands _on_report_files/figure-latex/rada...
l.147 ...n_report_files/figure-latex/radar_plot-1}

Error: LaTeX failed to compile O:/psd/ed/ML Learning/hands_on_report.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See hands_on_report.log for more info.
HubertR21 commented 10 months ago

Hi, I'm sorry for a very long response time. In recent package version the issue was fixed, and it should work properly.