Open petrovicboban opened 1 year ago
It sounds like two RandomForest
objects will have different seeds when instantiated, but the objects are considered equal since the default value of seed=randrange()
is the same. Is this still an issue if we the class explicitly assigns the seed in an initialization or post initialization step (even when the user doesn't provide one)?
Two RandomForest
objects will have the same seeds when instantiated. Although, it you re-run script, seed value will be different, but again, the same for both objects.
I don't think we can do it in __init__
and still pass all tests in test_sklearn_compatible_estimator
.
The only way to do it is to do it at the start of fit
like we do for some other parameters: leave the parameter in __init__
but set default value to None
, and then, in fit
, check if it's None
, and set random value if it is.
Actually, we can't do it. We'll have to move seed
to fit
parameters.
Let's have
Since default value for seed is random number, we expect:
but that's not what's happening.
although
rf_1
andrf_2
are really different objects:their attributes which have default value set are not:
Python does copy on write here, so if value is changed later, it will create new
seed
object then:The implication is that if you create different
RandomForest
objects with seed not explicitly set, seed will be random but all objects will have the same value. Is this acceptable to you? @edwardwliu @theo-s