TheDigitalFrontier / parallel-decision-trees

Semester project in CS205 Computing Foundations for Computational Science at Harvard School of Engineering and Applied Sciences, spring 2020.
MIT License
3 stars 1 forks source link

updated datasets and added a fix for assertion error in decision_tree #115

Closed hgupta18 closed 4 years ago

johannes-kk commented 4 years ago

This PR fixes a bugged edge case in DecisionTree.findBestSplit() where training would crash if when evaluating a new split all randomly chosen mtry columns contained only a single unique value. In that case, no split can be made without the right child containing no observations, and desired asserts would fail. The function has been extended to return -1 when this is case, which DecisionTee.fit() handles by not splitting that node further.