Closed parismita closed 6 years ago
ok, i'll add the tests and examples asap.....
Do you think it would be better to use namespacing when calling functions inside mmit? For instance, in mmit.cv, you call fit_and_score. What if another user has defined fit_and_score? It might be more robust to use mmit::fit_and_score. However, I'm not familiar with the best practices. This is just my intuition.
Nope, its automatically taken care of by NAMESPACE file, I just checked defining some global function with same name, but the functions used by mmit already use mmit as namespace.
Actions items:
Ok, I reviewed the benchmark code. It looks fine (see my comments). I noticed that you don't shuffle the data, which could be a problem if the examples are ordered in a particular way, but I think it is ok here, since we are just trying to compare both implementations on the same data.
Please do the comparison with the largest datasets only. The models and accuracies obtained for small datasets are likely to be very unstable.
I didnt shuffle them, so that same data goes into training for both implementations so that trees have higher probability to match
@parismita I did a quick code review. Did not dive into the details. Will continue tomorrow. Please add unit tests and compare the results of your pruning functions to those of the python implementation.