TheDigitalFrontier / parallel-decision-trees

Semester project in CS205 Computing Foundations for Computational Science at Harvard School of Engineering and Applied Sciences, spring 2020.
MIT License
3 stars 1 forks source link

Implement functionality for basic decision tree (classification). #44

Closed gpestre closed 4 years ago

gpestre commented 4 years ago

Absorbs most of the functions from helpers.cpp:

datasets.cpp : DataFrame https://github.com/johannes-kk/cs205_final_project/blob/718cc1acc9c908f888ccbdd9acea6afeab500411/src/helpers.cpp#L13

datasets.cpp : DataVector https://github.com/johannes-kk/cs205_final_project/blob/718cc1acc9c908f888ccbdd9acea6afeab500411/src/helpers.cpp#L14

losses.cpp : LossFunction("gini_impurity") : https://github.com/johannes-kk/cs205_final_project/blob/718cc1acc9c908f888ccbdd9acea6afeab500411/src/helpers.cpp#L80

losses.cpp : LossFunction("misclassification_rate") : https://github.com/johannes-kk/cs205_final_project/blob/718cc1acc9c908f888ccbdd9acea6afeab500411/src/helpers.cpp#L20

datasets.cpp :

// Returns a pair of frames (value above and below threshold in specified column).
std::vector<DataFrame*> split(int split_column, double split_threshold, bool equal_goes_left) const; 

https://github.com/johannes-kk/cs205_final_project/blob/718cc1acc9c908f888ccbdd9acea6afeab500411/src/helpers.cpp#L37

losses.cpp ? : https://github.com/johannes-kk/cs205_final_project/blob/718cc1acc9c908f888ccbdd9acea6afeab500411/src/helpers.cpp#L113

johannes-kk commented 4 years ago

Amazing! 🚀

gpestre commented 4 years ago

Doesn't implement prediction yet, but has a fit method that seems to run without error. Definitely needs some stress-testing though.