Open bfolie opened 2 years ago
Hi, how is it going? Is there any update on the issue? Thank you so much for a brief message in advance! Best, Christoph
Thanks for asking @BAMcvoelker . To be honest we hadn't thought about it in a while, but after seeing your comment we realized we have all of the tools and just need to thread them through.
We open sourced our splittable random number library, which means it's available to pull into Lolo. I will pull it in soon and use it to make bagged training reproducible.
Thank you so much @bfolie for the update and for picking up the topic again. I look forward to the update!
Bagger and MultiTaskBagger both train the individual models in parallel. Because the order of training is uncontrolled, this means that Lolo random forests are inherently non-reproducible, even if the bagging and the rngs for base learners are identical.
There are ways of guaranteeing reproducibility across multiple threads, and we should make use of them. SplittableRandom in Java A discussion in the context of numpy