Open andrew-esteban-imc opened 5 months ago
Thank you for raising the issue. We will revisit this after https://github.com/dmlc/xgboost/issues/6711 is resolved by removing the global random engine. Meanwhile, I will mark this one as a bug to increase the priority.
Hi again, Just wondering if there's anything I can do to help out with resolving this? Like I mentioned earlier my C++ abilities are somewhat lacking, but if there is something I can help with I'd gladly have a crack.
We are currently working on it among other items https://github.com/dmlc/xgboost/pull/10354 We want to use a booster-local random state instead of a global random state. The problem is that we can' reliably preserve the random state in the model file. As a result, after the PR is finished, the result would be deterministic. However, it's not necessarily true that training two smaller boosters would be the same as training a single large booster. We are still investigating.
Hi there,
We have found that despite setting a seed for our
hist
training, we get non-deterministic results when resuming training from a checkpoint. A reproducer can be seen below:We make use of
XgbCheckpointCallback
to fix a similar issue whereby restarting from a checkpoint ignores the epoch the checkpoint got up to. You can remove it, but then setting any of thecolsample_*
params to a value below1.0
will produce the same issue.When
tree_method
is set toexact
, the uninterrupted model and the interrupted model are identical. Whentree_method
is set tohist
andsubsample
is set to1.0
, they are also identical. When running withhist
andsubsample < 1.0
however, the results differ.I've seen #6711, but that seems to be somewhat different in nature.