Open ngrajales1 opened 4 years ago
I think the main problem here is that weights
in make_score(y_test, y_pred, weights)
can not be used correctly in K-fold CV since in each fold the samples in y_test
are different so that weights should be matched to that. I think it is related to the #1039 and I have a hacky demo that may help you.
Passing my own scorer that calculates weighted mean absolute error for a regression problem, results in error
Context of the issue
I followed the instructions from the TPOT documentation page to create my own scorer for a regression problem (I am trying to use tpot for Walmart kaggle competition https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting). The scorer I created calculates the weighted mean absolute error. I am pasting my code below:
weights & y_test & y_pred is type pandas.core.series.Series
I am also using local dask cluster to distribute my workload. Please let me know if it is a user error or something that may need to be looked into.
Current result
RuntimeError Traceback (most recent call last) /opt/anaconda3/envs/Nelson_Dask/lib/python3.8/site-packages/tpot/base.py in fit(self, features, target, sample_weight, groups) 699 warnings.simplefilter('ignore') --> 700 self.pop, = eaMuPlusLambda( 701 population=self._pop,
/opt/anaconda3/envs/Nelson_Dask/lib/python3.8/site-packages/tpot/gpdeap.py in eaMuPlusLambda(population, toolbox, mu, lambda, cxpb, mutpb, ngen, pbar, stats, halloffame, verbose, per_generation_function) 235 if per_generation_function is not None: --> 236 per_generation_function(gen) 237 # Vary the population
/opt/anaconda3/envs/Nelson_Dask/lib/python3.8/site-packages/tpot/base.py in _check_periodic_pipeline(self, gen) 1002 """ -> 1003 self._update_top_pipeline() 1004 if self.periodic_checkpoint_folder is not None:
/opt/anaconda3/envs/Nelson_Dask/lib/python3.8/site-packages/tpot/base.py in _update_top_pipeline(self) 792 if not self._optimized_pipeline: --> 793 raise RuntimeError('There was an error in the TPOT optimization ' 794 'process. This could be because the data was '
RuntimeError: There was an error in the TPOT optimization process. This could be because the data was not formatted properly, or because data for a regression problem was provided to the TPOTClassifier object. Please make sure you passed the data to TPOT correctly.
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)