Concatenation error when using FunctionTransformer and backtesting_forecaster

wasf84 commented 2 weeks ago

Olá amigos!

"TypeError: cannot concatenate object of type '<class 'numpy.float64'>'; only Series and DataFrame objs are valid"

I'm facing this error when I apply log-transformation on target variable using FunctionTransformer from SKLearn. I have to log-transform my target feature and my exogenous features as well. When I use FunctionTransformer on my exog features, everything works great, but not when target feature is transformed.

I've tried the same example from docs with no luck.

As workaround, I'm using a specific DataFrame previously log-transformed and applying OneHotEncoder only on categorical features, but I'd like to use correctly, if I can say, applying these transformations using just one DataFrame.

Obrigado pela atenção! =)

Target transformer

log_target_trans = FunctionTransformer(func=np.log1p, inverse_func=np.expm1, feature_names_out="one-to-one", accept_sparse=True)

Exogenous transformer

ohe_log_exog_trans = ColumnTransformer(transformers=[ ('ohe', OneHotEncoder(), cat_features), ('log', FunctionTransformer(func=np.log1p, inverse_func=np.expm1, feature_names_out="one-to-one", accept_sparse=True), rainfall+accumulated_rainfall) ], remainder="passthrough", verbose_feature_names_out=False )

Forecaster

forecaster = ForecasterAutoreg( regressor=LinearRegression(), lags=7, transformer_y=log_target_trans, transformer_exog=ohe_log_exog_trans, forecaster_id="LinReg_ohe_log" )

Backtesting procedure

f=1 _, y_pred = backtesting_forecaster( forecaster=forecaster, y=df.y, exog=df.drop(columns="y"), steps=f, metric='mean_absolute_error', initial_train_size=len(df)-365, refit=True, n_jobs=os.cpu_count()//2, interval=[2.5, 97.5], verbose=False, show_progress=True, random_state=1984 )

Features

exog categorical features: week, month, quarter exog numerical features: rainfall, rainfall_last_7_sum, rainfall_last_15_sum target (runoff): y

Traceback

{ "name": "TypeError", "message": "cannot concatenate object of type '<class 'numpy.float64'>'; only Series and DataFrame objs are valid", "stack": "--------------------------------------------------------------------------- _RemoteTraceback Traceback (most recent call last) _RemoteTraceback: \"\"\" Traceback (most recent call last): File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py\", line 463, in _process_worker r = call_item() File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py\", line 291, in call return self.fn(*self.args, self.kwargs) File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/parallel.py\", line 598, in call return [func(*args, *kwargs) File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/parallel.py\", line 598, in return [func(args, kwargs) File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/skforecast/model_selection/model_selection.py\", line 543, in _fit_predict_forecaster pred = forecaster.predict_interval( File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/skforecast/ForecasterAutoreg/ForecasterAutoreg.py\", line 1200, in predict_interval predictions = pd.concat((predictions, predictions_interval), axis=1) File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/pandas/core/reshape/concat.py\", line 382, in concat op = _Concatenator( File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/pandas/core/reshape/concat.py\", line 448, in init ndims = self._get_ndims(objs) File \"/home/wasf84/bin/miniconda3/envs/py39/lib/python3.9/site-packages/pandas/core/reshape/concat.py\", line 489, in _get_ndims raise TypeError(msg) TypeError: cannot concatenate object of type '<class 'numpy.float64'>'; only Series and DataFrame objs are valid \"\"\"

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last) Cell In[456], line 23 4 forecaster = ForecasterAutoreg( 5 regressor=LinearRegression(), 6 lags=7, (...) 9 forecaster_id=\"LinReg_ohelog\" (...) 21 ---> 23 , y_pred = backtesting_forecaster( 24 forecaster = forecaster, 25 y = df.y, 26 exog = df.drop(columns=target), 27 steps = f, 28 metric = 'mean_absolute_error', 29 initial_train_size = len(df)-365, 30 refit = True, 31 n_jobs = os.cpu_count()//2, 32 interval = [2.5, 97.5], 33 verbose = False, 34 show_progress = True, 35 random_state = 1984 36 )

File ~/bin/miniconda3/envs/py39/lib/python3.9/site-packages/skforecast/model_selection/model_selection.py:766, in backtesting_forecaster(forecaster, y, steps, metric, initial_train_size, fixed_train_size, gap, skip_folds, allow_incomplete_fold, exog, refit, interval, n_boot, random_state, in_sample_residuals, binned_residuals, n_jobs, verbose, show_progress) 758 if type(forecaster).name == 'ForecasterAutoregDirect' and \ 759 forecaster.steps < steps + gap: 760 raise ValueError( 761 (f\"When using a ForecasterAutoregDirect, the combination of steps \" 762 f\"+ gap ({steps + gap}) cannot be greater than the steps parameter \" 763 f\"declared when the forecaster is initialized ({forecaster.steps}).\") 764 ) --> 766 metric_values, backtest_predictions = _backtesting_forecaster( 767 forecaster = forecaster, 768 y = y, 769 steps = steps, 770 metric = metric, 771 initial_train_size = initial_train_size, 772 fixed_train_size = fixed_train_size, 773 gap = gap, 774 skip_folds = skip_folds, 775 allow_incomplete_fold = allow_incomplete_fold, 776 exog = exog, 777 refit = refit, 778 interval = interval, 779 n_boot = n_boot, 780 random_state = random_state, 781 in_sample_residuals = in_sample_residuals, 782 binned_residuals = binned_residuals, 783 n_jobs = n_jobs, 784 verbose = verbose, 785 show_progress = show_progress 786 ) 788 return metric_values, backtest_predictions

File ~/bin/miniconda3/envs/py39/lib/python3.9/site-packages/skforecast/model_selection/model_selection.py:560, in _backtesting_forecaster(forecaster, y, steps, metric, initial_train_size, fixed_train_size, gap, skip_folds, allow_incomplete_fold, exog, refit, interval, n_boot, random_state, in_sample_residuals, binned_residuals, n_jobs, verbose, show_progress) 555 pred = pred.iloc[gap:, ] 557 return pred 559 backtest_predictions = ( --> 560 Parallel(n_jobs=n_jobs) 561 (delayed(_fit_predict_forecaster) 562 (y=y, exog=exog, forecaster=forecaster, interval=interval, fold=fold) 563 for fold in folds) 564 ) 566 backtest_predictions = pd.concat(backtest_predictions) 567 if isinstance(backtest_predictions, pd.Series):

File ~/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/parallel.py:2007, in Parallel.call(self, iterable) 2001 # The first item from the output is blank, but it makes the interpreter 2002 # progress until it enters the Try/Except block of the generator and 2003 # reaches the first yield statement. This starts the asynchronous 2004 # dispatch of the tasks to the workers. 2005 next(output) -> 2007 return output if self.return_generator else list(output)

File ~/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/parallel.py:1650, in Parallel._get_outputs(self, iterator, pre_dispatch) 1647 yield 1649 with self._backend.retrieval_context(): -> 1650 yield from self._retrieve() 1652 except GeneratorExit: 1653 # The generator has been garbage collected before being fully 1654 # consumed. This aborts the remaining tasks if possible and warn 1655 # the user if necessary. 1656 self._exception = True

File ~/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/parallel.py:1754, in Parallel._retrieve(self) 1747 while self._wait_retrieval(): 1748 1749 # If the callback thread of a worker has signaled that its task 1750 # triggered an exception, or if the retrieval loop has raised an 1751 # exception (e.g. GeneratorExit), exit the loop and surface the 1752 # worker traceback. 1753 if self._aborting: -> 1754 self._raise_error_fast() 1755 break 1757 # If the next job is not ready for retrieval yet, we just wait for 1758 # async callbacks to progress.

File ~/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/parallel.py:1789, in Parallel._raise_error_fast(self) 1785 # If this error job exists, immediately raise the error by 1786 # calling get_result. This job might not exists if abort has been 1787 # called directly or if the generator is gc'ed. 1788 if error_job is not None: -> 1789 error_job.get_result(self.timeout)

File ~/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/parallel.py:745, in BatchCompletionCallBack.get_result(self, timeout) 739 backend = self.parallel._backend 741 if backend.supports_retrieve_callback: 742 # We assume that the result has already been retrieved by the 743 # callback thread, and is stored internally. It's just waiting to 744 # be returned. --> 745 return self._return_or_raise() 747 # For other backends, the main thread needs to run the retrieval step. 748 try:

File ~/bin/miniconda3/envs/py39/lib/python3.9/site-packages/joblib/parallel.py:763, in BatchCompletionCallBack._return_or_raise(self) 761 try: 762 if self.status == TASK_ERROR: --> 763 raise self._result 764 return self._result 765 finally:

TypeError: cannot concatenate object of type '<class 'numpy.float64'>'; only Series and DataFrame objs are valid" }

JoaquinAmatRodrigo commented 2 weeks ago

Hi @wasf84, Thanks for reporting this error.

The problem seems to be in this line: predictions = pd.concat((predictions, predictions_interval), axis=1) Could you print the version of numpy, pandas and skforecast that you are using?

wasf84 commented 2 weeks ago

Hi @wasf84, Thanks for reporting this error.

The problem seems to be in this line: predictions = pd.concat((predictions, predictions_interval), axis=1) Could you print the version of numpy, pandas and skforecast that you are using?

Olá, Joaquin. I forgot to check the versions and put here, sorry.

NumPy: 1.25.0 Pandas: 2.2.2 SKForecast: 0.13.0

I've tested predict_intervals and crashed in the same way. predict_quantiles doesn't crashes.

Thanks.

lucayapi commented 4 days ago

Hello @wasf84, did you find a solution to this error ? I have the same probleme !

JoaquinAmatRodrigo commented 4 days ago

Hi @lucayapi,

Could you share a reproducible example?

wasf84 commented 3 days ago

Hello @wasf84, did you find a solution to this error ? I have the same probleme !

Hi @lucayapi

I did not find a solution. I'm using a separated DataFrame previously log-transformed and then I apply OneHotEncoder only on categorical features as workaround. This works fine.

JoaquinAmatRodrigo / skforecast