Open jecorn opened 9 months ago
what version of tpot are you using? I was able to reproduce the issue in version 0.12.0, but not 0.12.2 (yet). I haven't nailed down exactly what the issue was but it seems to work for me on the latest version.
This was with tpot 0.12.1. I just grabbed tpot 1.12.2 and will update when I get a chance to try it out it.
This was with tpot 0.12.1. I just grabbed tpot 1.12.2 and will update when I get a chance to try it out it.
tpot 0.12.2 can now get past the error. Thanks!
Unfortunately, now there's a different error that I think might be hard to troubleshoot. Running on a single core, tpot starts going through pipelines. But when parallelizing one of the works throws an exception. It might be during a later pipelines (since it doesn't happen on a single core). I'll try to do some digging.
"""
Traceback (most recent call last):
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
r = call_item()
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
return self.fn(*self.args, **self.kwargs)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/parallel.py", line 589, in __call__
return [func(*args, **kwargs)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/parallel.py", line 589, in <listcomp>
return [func(*args, **kwargs)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/stopit/utils.py", line 145, in wrapper
result = func(*args, **kwargs)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/tpot/gp_deap.py", line 424, in _wrapped_cross_val_score
cv_iter = list(cv.split(features, target, groups))
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/sklearn/model_selection/_split.py", line 808, in split
y = check_array(y, input_name="y", ensure_2d=False, dtype=None)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1097, in check_array
array.flags.writeable = True
ValueError: cannot set WRITEABLE flag to True of this array
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/tpot/base.py", line 817, in fit
self._pop, _ = eaMuPlusLambda(
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/tpot/gp_deap.py", line 232, in eaMuPlusLambda
population[:] = toolbox.evaluate(population)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/tpot/base.py", line 1575, in _evaluate_individuals
tmp_result_scores = parallel(
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/parallel.py", line 1952, in __call__
return output if self.return_generator else list(output)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/parallel.py", line 1595, in _get_outputs
yield from self._retrieve()
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/parallel.py", line 1699, in _retrieve
self._raise_error_fast()
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast
error_job.get_result(self.timeout)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/parallel.py", line 736, in get_result
return self._return_or_raise()
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/joblib/parallel.py", line 754, in _return_or_raise
raise self._result
ValueError: cannot set WRITEABLE flag to True of this array
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/data/jcorn/autodisco/scripts/auto_tpot.py", line 39, in <module>
pipeline_optimizer.fit(X, y)
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/tpot/base.py", line 864, in fit
raise e
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/tpot/base.py", line 855, in fit
self._update_top_pipeline()
File "/home/cornlab/miniconda3/envs/jcorn/lib/python3.10/site-packages/tpot/base.py", line 963, in _update_top_pipeline
raise RuntimeError(
RuntimeError: A pipeline has not yet been optimized. Please call fit() first.
I've come across this error at some point too, but it seems to be working on my machine now. IIRC it was due to a package version issue. Try updating the packages, I think my issue was with an outdated version of pandas or numpy?
Context of the issue
I have a large and imbalanced binary classification dataset: approx 2,000,000 negative cases and 5,000 positive cases, with 45 features. I have been running manual sklearn pipelines on this dataset without problem for a while. My manual work includes StratifiedKFold cross validation on algorithms such as random forests, gradient boosting, MLP, and more. All of the manual work has been fine.
I recently learned of TPOT (what an awesome idea! huge thanks, devs!) and was excited to give it a try. But on the exact same dataset, I'm getting an error
The least populated class in y has only 1 members, which is less than n_splits=5.
This happens after about 50-60 TPOT iterations. I'm using stratification in train_test_split, and it's just a binary classification. So I'm not sure how a split could end up underpopulated. It's also strange that this same dataset works fine manually with stratification/splitting that (so far as I understand) is identical to what TPOTClassifier uses.I saw a few other reports of this error both for sklearn and TPOT. But it was always on multilabel classification. So I'm a bit stumped.
TPOT script
Error