alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
774 stars 86 forks source link

`random_state` usage is inconsistent with sklearn #347

Closed piguy314159265 closed 4 years ago

piguy314159265 commented 4 years ago

Expected behavior: random_state parameters support the same param types as sklearn.utils.check_random_state (which is how random state args are generally used by sklearn arguments):

The way sklearn (and ayx-learn) handle this is via the sklearn.utils.check_random_state which takes any of those 3 types and returns a numpy.random.RandomState object.

EvalML seems to indicate it only supports ints (https://github.com/FeatureLabs/evalml/blob/632afc486c7ac0845771278716eccb18cecab2cc/evalml/automl/auto_regression_search.py#L66).

Since the Mersenne Twister RNG that numpy uses has 2^19937-1 internal random states and a 64 bit int obviously has 2^64, there's clearly no inverse to seeding the random state with an int; i.e. no reliable way to take a random state object used by an sklearn or ayx_learn object, convert that to an int, and replicate results reliably.

dsherry commented 4 years ago

Windows unit tests appear to be failing due to this PR; reopening.