TheDigitalFrontier / parallel-decision-trees

Semester project in CS205 Computing Foundations for Computational Science at Harvard School of Engineering and Applied Sciences, spring 2020.
MIT License
3 stars 1 forks source link

RF OpenMP seeding does not give consistent results #101

Closed johannes-kk closed 4 years ago

johannes-kk commented 4 years ago

Running the OpenMP parallelised RandomForest.fit() does not yield consistent test accuracy scores, even with the same seed.

Hypothesis: /src-openmp/random_forest.cpp ln 134 loop iterations have to happen in sequence across threads. The ordering of the trees i does not matter, but we must make pairwise pulls to RandomForest.SeedGenerator.new_seed() for data_seed and tree_seed. If for the same i another thread makes a new_seed() call, the result will be mixing up the meta-seeded seeds.

johannes-kk commented 4 years ago

As a simple fix, create data_seed and tree_seed vectors outside the loop that fits the ntree decision trees, and simply pull from those vectors with the existing loop index i.