Closed ghost closed 4 years ago
The issue is that max_eval_time_mins
is too small for evaluating pipeline on this large dataset and random forest maybe very time-consuming with a large number of n_estimators
. Please increase max_eval_time_mins
to 20. If it does not work or is too slow, please use subsample
to randomly down-sampling the dataset.
Thanks for your response. We tried the option you suggested but now optimization goes up to 100% and sometimes it gives the same error as reported above while the other it reports 50% balanced_accuracy. When running default RF outside TPOT, we get balanced_accuracy > 87 %. we are not sure where the issue is and how to solve it.
Since there is only one estimator (random forest) for the config_dict
, you may try template="Classifier" to avoid complicated stacking pipelines (>1 estimators) to save computational time for evaluating each pipeline.
Thanks for the help.
We have large (>500K points) dataset with more than 20 features with two classes. Data is properly scaled and NaN are imputed. The setting we are executing is
The config only has one random forest configuration. We are running it on both windows and linux machine that has sklearn version 0.22.1 and TPOT version 0.10.2.
Current result