Add a boolean flag to turn on/off vectorization: Now by setting the vectorizationflag to be true/false, you can turn on and off vectorization of impurity calculations and histogram insertions (classification only as we vectorized regression case before submission)
Change the y axis of scaling experiments from wall-clock time to the number of insertions: After this change, we can observe that our algorithm is scaling with a logarithm of data size. Images below are the graphs I plot by running investigate_scaling.py and make_scaling_plot.py on my laptop.
Turn off the divide-by-zero warning: turn off in the first line of the get_impurity_reductions function.
Complete three tasks assigned by Mo here
vectorization
flag to be true/false, you can turn on and off vectorization of impurity calculations and histogram insertions (classification only as we vectorized regression case before submission)investigate_scaling.py
andmake_scaling_plot.py
on my laptop.get_impurity_reductions
function.\logs
directory.