We thank you for your time in reviewing our submission. We understand that you are performing a service to the community. In order to ensure that review of this code is as easy as possible, we have:
repro_results.sh
)Thank you for reviewing our submission!
The files are organized as follows:
data_structures
contains all of the data structures used in our experiments, e.g.,
forests, trees, nodes, and histograms.
wrappers
subdirectory contains convenience classes to instantiate models of various types,
e.g., a RandomForestClassifier
is a forest classifier with bootstrap=True
(indicating to draw a bootstrap sample of the n
datapoints for each tree), feature_subsampling=SQRT
(indicating to consider only sqrt(F)
features of the original F
features at each node split), etc.experiments
subdirectory contains all the code for our core experiments
experiments/runtime_exps
contains the script (compare_runtimes.py
) to reproduce the results of Tables 1 and 2, as well as the results of running that script (the files ending in _profile
or _dict
)experiments/budget_exps
contains the script (compare_budgets.py
) to reproduce the results of Tables 3 and 4, as well as the results of running that script (the files ending in _dict
)experiments/sklearn_exps
contains the script (compare_baseline_implementations.py
) to reproduce the results of Table 6 in Appendix 4experiments/scaling_exps
contains the scripts (investigate_scaling.py
and make_scaling_plot.py
) to reproduce Appendix Figure 1 in Appendix 2 tests
subdirectory tests that we wrote to verify the correctness of our implementations
tests/feature_importance_tests.py
is also used to regenerate the results in Table 5tests/feature_importance_tests.py
. The results will be stored in the first 4 lines of tests/stat_test_stability_log/reproduce_stability.csv
file. utils
directory contains helper code for training forest-based models
utils/solvers.py
includes the core implementation of MABSplit in the solve_mab()
functionrepro_script.sh
. This may take many hours.