[Re] Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods

The aim of this repository is to reproduce the claims, presented in paper Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods.

Agarwal, A., Tan, Y.S., Ronen, O., Singh, C. & Yu, B.. (2022). Hierarchical Shrinkage: Improving the accuracy and interpretability of tree-based models.. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:111-135 Available from https://proceedings.mlr.press/v162/agarwal22b.html.

Environment

For the Python environment, we decided to go with conda. Below is the list of commands to create an environment and install all libraries.

conda create -n rehs python=3.10
conda activate rehs
conda install -c conda-forge numpy pandas scikit-learn scikit-optimize scipy nb_conda shap plotnine matplotlib tqdm
pip install imodels pmlb bartpy

Since we used two different computers, the environment files from both of them are in the env folder.

For the R environment we used libraries cmdstanr and bayesplot:

install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
install.packages("bayesplot")

Folder structure

The top-level folders are as follows:

data: includes manually downloaded datasets,
env: the .yml files with the Python environment information,
notebooks: the notebooks for all of the experiments,
tests: test files.

Each folder has its own README.md file for better clarification.

do8572 / MLDS

readme

[Re] Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods

Environment

Folder structure