Closed david-cortes closed 5 years ago
@david-cortes Would you like to open a PR as you already familiar with it?
I’m not really familiar with how it works, but have been checking it in a bit more detail.
I’ve now realized that the class RandomState is from numpy (link), and as of v1.17, it is a container for a random state in a Mersenne-Twister pseudo-random number generator, which is supposed to be used by the classifier/regressor for random number generation, with its state becoming changed as it generates random numbers, and this new state then passed to the next classifier/regressor.
Since xgboost doesn’t use this python class for random number generation, and it is able to use something other than the default C++’s MT19937, I think it might not be feasible to simply convert python-C++ states back and forth. Even then, NumPy might change the default RNG from MT19937 to something else in the future too.
I guess a potential solution would be to use the python RandomState object to generate a random integer, which would then be set as seed for xgboost’s C++ RNG – that way, it achieves the purpose of setting a reproducible seed, and its random state is modified, even though the quality of these short seedings might not be very good.
Would that be an acceptable solution?
I’m not really familiar with how it works, but have been checking it in a bit more detail.
@david-cortes It seems you have a lot more insight than me. ;-)
guess a potential solution would be to use the python RandomState object to generate a random integer, which would then be set as seed for xgboost’s C++ RNG – that way, it achieves the purpose of setting a reproducible seed, and its random state is modified, even though the quality of these short seedings might not be very good.
Glancing through sklearn's document, is it a fair guess that the random state is changed per iteration? We can do that for XGBoost too, this way I believe the effect is the same. Haven't look into their code yet but sklearn has to do similar thing to actually use RandomState
right?
Took a look at SciKit-Learn's IterativeImputer code and it seems the random state is modified at each iteration if some random number is generated inside the regressor/classifier, as it simply sets the attribute random_state
of the object to its own (which does not perform a deep copy AFAIK and from some quick testing), and once this object generates a random number, the state will change.
Made a quick PR with the approach of using the RandomState object to draw an integer seed, but I guess there's other possible approaches too.
If I try to use
XGBRegressor
in SciKit Learn's Iterative Imputer, it will fail due to the random state that is passed from the imputer - example:Would be nice if xgboost's scikit-learn API classes could accept this type of random state like sklearn's own classifiers/regressors.
Full example: