Open Iris7788 opened 2 years ago
If you set n_jobs to 1, reproducibility is more likely. When using parallel processes, exact reproducibility gets challenging since the order of execution has some randomness that is not controllable. It is something we are thinking about
你的邮件我已经收到啦,我会尽快查收哒~
Context of the issue
I used tpot to fit my dataset, I got the different export pipeline for each run.
Process to reproduce the issue
The steps for generating exported pipeline, the shape of my dataset was (45, 478).
Current result
Best pipeline: DecisionTreeRegressor(Normalizer(input_matrix, norm=max), max_depth=3, min_samples_leaf=10, min_samples_split=9)
Generation 1 - Current best internal CV score: -0.6631261058133652 Generation 2 - Current best internal CV score: -0.6631261058133652 Generation 3 - Current best internal CV score: -0.6593793694494272 Generation 4 - Current best internal CV score: -0.6524528603774085 Generation 5 - Current best internal CV score: -0.636417747633282 Generation 6 - Current best internal CV score: -0.633586381252542 Generation 7 - Current best internal CV score: -0.633586381252542 Generation 8 - Current best internal CV score: -0.633586381252542 Generation 9 - Current best internal CV score: -0.633586381252542 Generation 10 - Current best internal CV score: -0.633586381252542
Best pipeline: ExtraTreesRegressor(LinearSVR(input_matrix, C=1.0, dual=True, epsilon=0.01, loss=epsilon_insensitive, tol=1e-05), bootstrap=False, max_features=0.3, min_samples_leaf=6, min_samples_split=13, n_estimators=100)