ModelOriented / forester

Trees are all you need
https://modeloriented.github.io/forester/
GNU General Public License v3.0
108 stars 14 forks source link

Is report() ready for use? LaTex failed to compile #100

Closed RickPack closed 1 year ago

RickPack commented 1 year ago

I understand this package is in active development and wondered if report() is ready for use?

I am seeing: Error: LaTeX failed to compile C:/Users/RickPack/Documents/report.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips.

When I run:

library(forester)
library(ggradar)
data('lisbon')
train_output <- train(lisbon, 'Price')
train_output$ranked_list
saveRDS(train_output, "train_output.rds")
train_output <- readRDS("train_output.rds")
report(train_output)

Here are the .tex file contents. I do not know how to identify problems in this code. % Options for packages loaded elsewhere \PassOptionsToPackage{unicode}{hyperref} \PassOptionsToPackage{hyphens}{url} % \documentclass[ ]{article} \usepackage{amsmath,amssymb} \usepackage{lmodern} \usepackage{iftex} \ifPDFTeX \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{textcomp} % provide euro and other symbols \else % if luatex or xetex \usepackage{unicode-math} \defaultfontfeatures{Scale=MatchLowercase} \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1} \fi % Use upquote if available, for straight quotes in verbatim environments \IfFileExists{upquote.sty}{\usepackage{upquote}}{} \IfFileExists{microtype.sty}{% use microtype if available \usepackage[]{microtype} \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts }{} \makeatletter \@ifundefined{KOMAClassName}{% if non-KOMA class \IfFileExists{parskip.sty}{% \usepackage{parskip} }{% else \setlength{\parindent}{0pt} \setlength{\parskip}{6pt plus 2pt minus 1pt}} }{% if KOMA class \KOMAoptions{parskip=half}} \makeatother \usepackage{xcolor} \usepackage[margin=1in]{geometry} \usepackage{longtable,booktabs,array} \usepackage{calc} % for calculating minipage widths % Correct order of tables after \paragraph or \subparagraph \usepackage{etoolbox} \makeatletter \patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{} \makeatother % Allow footnotes in longtable head/foot \IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}} \makesavenoteenv{longtable} \usepackage{graphicx} \makeatletter \def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi} \def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi} \makeatother % Scale images if necessary, so that they will not overflow the page % margins by default, and it is still possible to overwrite the defaults % using explicit options in \includegraphics[width, height, ...]{} \setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio} % Set default figure placement to htbp \makeatletter \def\fps@figure{htbp} \makeatother \setlength{\emergencystretch}{3em} % prevent overfull lines \providecommand{\tightlist}{% \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} \setcounter{secnumdepth}{-\maxdimen} % remove section numbering \ifLuaTeX \usepackage{selnolig} % disable illegal ligatures \fi \IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}} \IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available \urlstyle{same} % disable monospaced font for URLs \hypersetup{ pdftitle={Forester report}, hidelinks, pdfcreator={LaTeX via pandoc}}

\title{\textbf{Forester report}} \usepackage{etoolbox} \makeatletter \providecommand{\subtitle}[1]{% add subtitle to \maketitle \apptocmd{\@title}{\par {\large #1 \par}}{}{} } \makeatother \subtitle{version 1.1.4} \author{} \date{\vspace{-2.5em}2023-02-26 09:19:45}

\begin{document} \maketitle

This report contains details about the best trained model, table with metrics for every trained model, scatter plot for chosen metric and info about used data.

\hypertarget{the-best-models}{% \subsection{The best models}\label{the-best-models}}

This is the \textbf{regression} task.\ The best model is: \textbf{xgboost_bayes}.

The names of the models were created by a pattern \emph{Engine_TuningMethod_Id}, where:

\begin{itemize} \item Engine describes the engine used for the training (random_forest, xgboost, decision_tree, lightgbm, catboost), \item TuningMethod describes how the model was tuned (basic for basic parameters, RS for random search, bayes for Bayesian optimization), \item Id for separating the random search parameters sets. \end{itemize}

\emph{More details about the best model are present at the end of the report.}

\begin{longtable}[]{@{}rlrrr@{}} \toprule() no. & name & mse & r2 & mae \ \midrule() \endhead 46 & xgboost_bayes & 29045618632 & 0.8251 & 114802.4 \ 1 & ranger_model & 36032985636 & 0.7831 & 110595.8 \ 11 & ranger_RS_7 & 36709463913 & 0.7790 & 114183.8 \ 4 & lightgbm_model & 37078453349 & 0.7768 & 120380.9 \ 6 & ranger_RS_2 & 37551918917 & 0.7739 & 116539.8 \ 14 & ranger_RS_10 & 37937511569 & 0.7716 & 112570.7 \ 48 & lightgbm_bayes & 38967156412 & 0.7654 & 120472.1 \ 45 & ranger_bayes & 39214319477 & 0.7639 & 112905.0 \ 8 & ranger_RS_4 & 39230920301 & 0.7638 & 120822.4 \ 2 & xgboost_model & 39888825447 & 0.7599 & 118423.8 \ 3 & decision_tree_model & 40059950278 & 0.7588 & 138294.8 \ 25 & decision_tree_RS_1 & 40059950278 & 0.7588 & 138294.8 \ 26 & decision_tree_RS_2 & 40059950278 & 0.7588 & 138294.8 \ 27 & decision_tree_RS_3 & 40059950278 & 0.7588 & 138294.8 \ 28 & decision_tree_RS_4 & 40059950278 & 0.7588 & 138294.8 \ 29 & decision_tree_RS_5 & 40059950278 & 0.7588 & 138294.8 \ 30 & decision_tree_RS_6 & 40059950278 & 0.7588 & 138294.8 \ 31 & decision_tree_RS_7 & 40059950278 & 0.7588 & 138294.8 \ 32 & decision_tree_RS_8 & 40059950278 & 0.7588 & 138294.8 \ 33 & decision_tree_RS_9 & 40059950278 & 0.7588 & 138294.8 \ 34 & decision_tree_RS_10 & 40059950278 & 0.7588 & 138294.8 \ 47 & decision_tree_bayes & 40059950278 & 0.7588 & 138294.8 \ 36 & lightgbm_RS_2 & 40264937892 & 0.7576 & 126275.4 \ 39 & lightgbm_RS_5 & 40264937892 & 0.7576 & 126275.4 \ 44 & lightgbm_RS_10 & 40264937892 & 0.7576 & 126275.4 \ 35 & lightgbm_RS_1 & 40564756681 & 0.7558 & 128109.3 \ 43 & lightgbm_RS_9 & 40564756681 & 0.7558 & 128109.3 \ 13 & ranger_RS_9 & 41641257078 & 0.7493 & 121439.3 \ 42 & lightgbm_RS_8 & 41899397418 & 0.7478 & 123657.5 \ 5 & ranger_RS_1 & 43830145548 & 0.7361 & 122280.1 \ 10 & ranger_RS_6 & 44082111550 & 0.7346 & 126630.9 \ 37 & lightgbm_RS_3 & 46993866396 & 0.7171 & 138958.1 \ 38 & lightgbm_RS_4 & 60741902723 & 0.6343 & 143590.4 \ 40 & lightgbm_RS_6 & 60741902723 & 0.6343 & 143590.4 \ 12 & ranger_RS_8 & 65782798788 & 0.6040 & 173055.5 \ 9 & ranger_RS_5 & 66903827062 & 0.5972 & 178460.7 \ 41 & lightgbm_RS_7 & 67609578994 & 0.5930 & 146871.3 \ 7 & ranger_RS_3 & 76377007729 & 0.5402 & 184501.6 \ 19 & xgboost_RS_5 & 103435581669 & 0.3773 & 217806.3 \ 23 & xgboost_RS_9 & 103435581669 & 0.3773 & 217806.3 \ 16 & xgboost_RS_2 & 103807814505 & 0.3751 & 217616.8 \ 21 & xgboost_RS_7 & 103807814505 & 0.3751 & 217616.8 \ 17 & xgboost_RS_3 & 174906483351 & -0.0530 & 303429.4 \ 18 & xgboost_RS_4 & 174906483351 & -0.0530 & 303429.4 \ 15 & xgboost_RS_1 & 176441884982 & -0.0622 & 309509.7 \ 24 & xgboost_RS_10 & 346130052714 & -1.0838 & 467982.3 \ 20 & xgboost_RS_6 & 347667942720 & -1.0930 & 470990.6 \ 22 & xgboost_RS_8 & 347667942720 & -1.0930 & 470990.6 \ \bottomrule() \end{longtable}

\hypertarget{plots-for-all-models}{% \subsection{Plots for all models}\label{plots-for-all-models}}

\begin{verbatim} [1] "no." "name" "engine" "tuning" "mse" "r2" "mae"
no. name engine tuning mse r2 mae 1 46 xgboost_bayes xgboost bayes_opt 29045618632 0.8251402 114802.4 2 1 ranger_model ranger basic 36032985636 0.7830751 110595.8 3 11 ranger_RS_7 ranger random_search 36709463913 0.7790025 114183.8 4 4 lightgbm_model lightgbm basic 37078453349 0.7767812 120380.9 5 6 ranger_RS_2 ranger random_search 37551918917 0.7739308 116539.8 6 14 ranger_RS_10 ranger random_search 37937511569 0.7716095 112570.7 7 48 lightgbm_bayes lightgbm bayes_opt 38967156412 0.7654108 120472.1 8 45 ranger_bayes ranger bayes_opt 39214319477 0.7639229 112905.0 9 8 ranger_RS_4 ranger random_search 39230920301 0.7638229 120822.4 10 2 xgboost_model xgboost basic 39888825447 0.7598622 118423.8 xgboost_bayes ranger_model ranger_RS_7 lightgbm_model ranger_RS_2 ranger_RS_10 1 0.8251402 0.7830751 0.7790025 0.7767812 0.7739308 0.7716095 lightgbm_bayes ranger_bayes ranger_RS_4 xgboost_model 1 0.7654108 0.7639229 0.7638229 0.7598622 \end{verbatim}

\includegraphics[width=0.9\linewidth]{C:/Users/RickPack/Documents/report_files/figure-latex/radar_plot-1}

\includegraphics[width=0.9\linewidth]{C:/Users/RickPack/Documents/report_files/figure-latex/boxplot-1}

\includegraphics[width=1\linewidth]{C:/Users/RickPack/Documents/report_files/figure-latex/VS_plot-1}

\hypertarget{plots-for-the-best-model---xgboost_bayes}{% \subsection{\texorpdfstring{Plots for the best model - \textbf{xgboost_bayes}}{Plots for the best model - xgboost_bayes}}\label{plots-for-the-best-model---xgboost_bayes}}

\includegraphics[width=0.5\linewidth]{C:/Users/RickPack/Documents/report_files/figure-latex/plots_for_the_best_model-1} \includegraphics[width=0.5\linewidth]{C:/Users/RickPack/Documents/report_files/figure-latex/plots_for_the_best_model-2}

\hypertarget{feature-importance-for-the-best-model---xgboost_bayes}{% \subsection{\texorpdfstring{Feature Importance for the best model - \textbf{xgboost_bayes}}{Feature Importance for the best model - xgboost_bayes}}\label{feature-importance-for-the-best-model---xgboost_bayes}}

\includegraphics{C:/Users/RickPack/Documents/report_files/figure-latex/feature_importance-1.pdf}

\hypertarget{details-about-data}{% \subsection{Details about data}\label{details-about-data}}

-------------------- \textbf{CHECK DATA REPORT} --------------------

\textbf{The dataset has 246 observations and 17 columns which names are: }

Id; Condition; PropertyType; PropertySubType; Bedrooms; Bathrooms; AreaNet; AreaGross; Parking; Latitude; Longitude; Country; District; Municipality; Parish; Price.M2; Price;

\textbf{With the target value described by a column:} Price.

Static columns are: Country; District; Municipality;

\textbf{With dominating values: }Portugal; Lisboa; Lisboa;

\textbf{These column pairs are duplicate: } District - Municipality;

\textbf{No target values are missing. }

\textbf{No predictor values are missing. }

\textbf{No issues with dimensionality. }

\textbf{Strongly correlated, by Spearman rank, pairs of numerical values are: }

Bedrooms - AreaNet: 0.77; Bedrooms - AreaGross: 0.77; Bathrooms - AreaNet: 0.78; Bathrooms - AreaGross: 0.78; AreaNet - AreaGross: 1;

Strongly correlated, by Crammer's V rank, pairs of categorical values are:

PropertyType - PropertySubType: 1;

\textbf{These obserwation migth be outliers due to their numerical columns values: }

145 146 196 44 5 51 57 58 59 60 61 62 63 64 69 75 76 77 78 ;

\textbf{Target data is not evenly distributed with quantile bins:} 0.25 0.35 0.14 0.26

\textbf{Columns names suggest that some of them are IDs, removing them can improve the model. Suspicious columns are: }

Id

\textbf{Columns data suggest that some of them are IDs, removing them can improve the model. Suspicious columns are: }

Id

-------------------- \textbf{CHECK DATA REPORT END} --------------------

\hypertarget{the-best-model-details}{% \subsection{The best model details}\label{the-best-model-details}}

\begin{verbatim} ------------ Xgboost model ------------

Parameters niter: 30 evaluation_log: iter : train_rmse 1 : 381718.290917727 2 : 284957.30334414 3 : 248646.958840744 4 : 236909.526870439 5 : 226958.192268788 6 : 219655.138097904 7 : 211685.43117799 8 : 205246.944211267 9 : 193050.907356864 10 : 182695.157576117 11 : 177979.360202398 12 : 172372.097236018 13 : 164562.414114256 14 : 158817.237088777 15 : 152787.38624367 16 : 148496.118553976 17 : 144224.693332455 18 : 140214.554582863 19 : 136330.657054649 20 : 132485.672336577 21 : 128937.921213947 22 : 125482.685445145 23 : 122222.184187203 24 : 119002.004461565 25 : 115784.298828111 26 : 112894.697251728 27 : 110191.564988163 28 : 107518.478449887 29 : 105058.899448766 30 : 102608.4029147 \end{verbatim}

\end{document}

HubertR21 commented 1 year ago

Have you tried installing / reinstalling tinytex? In the REDAME of the package we suggest to run the code below, when issues with generating the report occur

install.packages('tinytex')
tinytex::install_tinytex()
RickPack commented 1 year ago

Thank you, Hubert. Yes, I did install tinytex per those instructions. I will try a reboot soon and see if the problem persists. By the way, I posted about the use of forester in this Kaggle discussion post: https://www.kaggle.com/competitions/godaddy-microbusiness-density-forecasting/discussion/390416#2160320.

I assumed that it made sense to use the top two models identified by the score_valid object of what is output by train():

forester_model_preprocess_nolockdown <-
  train(
    data=
      data.frame(train_eval_lockdown %>% 
select_if(is.numeric) %>%
                   select(-days_since_lockdown)),
    type = 'regression',
    advanced_preprocessing = TRUE,
    y = 'microbusiness_density',
    train_test_split = c(0.7, 0.2, 0.1),
    engine = c('ranger','decision_tree','lightgbm','catboost'))

forester_model_preprocess_nolockdown$score_valid

RickPack commented 1 year ago

report() worked on my home computer. Thanks!