There isn't a way to implement repeatability in our construction of the seed. There are pragma directives for executing certain parts of the code in sequence but doing that limits speedup. I created a vector of seeds before parallelization to give us the behavior we want.
Couple other points:
Having said that, there seems to be another source of randomness elsewhere. For certain master seed values, the results accuracy score jumps around between 2-3 values. This may be due to splitting happening randomly at a node where multiple splits give the same scores.
Running the script multiple times in quick succession started giving out-of-memory errors. Perhaps we should be deleting some data from memory after each run? Not a huge deal, but @gpestre might have ideas.
@hgupta18 can you add what you described above to a new Issue, labeled as bug, that describes the issue in more detail so we can address if it time permits?
There isn't a way to implement repeatability in our construction of the seed. There are pragma directives for executing certain parts of the code in sequence but doing that limits speedup. I created a vector of seeds before parallelization to give us the behavior we want.
Couple other points: