TheDigitalFrontier / parallel-decision-trees

Semester project in CS205 Computing Foundations for Computational Science at Harvard School of Engineering and Applied Sciences, spring 2020.
MIT License
3 stars 1 forks source link

Implement bootstrapping #60

Closed johannes-kk closed 4 years ago

johannes-kk commented 4 years ago

Bootstrap resampling with replacement. Can specify nrow and seed, but not whether or not to replace. Was thinking of making this method the de facto train/test splitter, but easier to just replicate the needed functionality in a dedicated method for that.

Requested @gpestre review to verify I am not needlessly making copies of data or messing with the underlying dataframes – or any other mistake I can't think of.

johannes-kk commented 4 years ago

Excellent point and explanation, @gpestre ! Certainly helped me understand better what's going on. I added a commit to incorporate your proposed changes. Didn't run tests on it, but I assume it's working – and if not we just have one more Issue in the backlog 😅