JoaquinAmatRodrigo / skforecast

Time series forecasting with machine learning models
https://skforecast.org
BSD 3-Clause "New" or "Revised" License
992 stars 113 forks source link

In Sample Residuals "Need atleast one array to concatenate" #703

Open jrodenbergrheem opened 3 weeks ago

jrodenbergrheem commented 3 weeks ago

I am trying to calculate in-sample residuals and for some reason I get this error on all of my forecasters when not setting "store_insample_residuals" = FALSE.

Hopefully it's a small issue.

image

Thanks!

JoaquinAmatRodrigo commented 3 weeks ago

Hi @jrodenbergrheem, Thanks for reporting this error. Could you share a reproducible example?

jrodenbergrheem commented 2 weeks ago

Sure, here is a code snippet! `If it is more helpful I can send you a notebook with it occuring!

`forecaster = ForecasterAutoreg(regressor=Ridge(alpha=1), lags = [1,2,3,4,5,6,7,8,9,10] + [78, 78*2,

78*3,

                                               ],
              transformer_y=StandardScaler(),
              transformer_exog=exog_pipe_2,
              weight_func=custom_weights)

forecaster.fit(y=data.loc[:end_train,'SUCTIONT'], exog=exog_df.loc[:end_train], store_in_sample_residuals=False )``

JoaquinAmatRodrigo commented 2 weeks ago

Hi @jrodenbergrheem For a reproducible example I also need the data. Could you try the same code with simulated data and check if you still see the same error?

Thanks a lot!

JavierEscobarOrtiz commented 2 weeks ago

Hi @jrodenbergrheem,

We think this bug is related to an old version of Numpy. Could you check if this code works for you with Python 3.11?

import sys
import numpy as np
import pandas as pd
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.linear_model import Lasso
import skforecast
import sklearn

print(f"Python: {sys.version}")
print(f"Numpy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"skforecast: {skforecast.__version__}")
print(f"skearn: {sklearn.__version__}")

df = pd.read_csv(
    "https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/demand.csv"
)
df.drop(columns=["Industrial"], inplace=True)
# Convert the integer Date to an actual date with datetime type
df["date"] = df["Date"].apply(
    lambda x: pd.Timestamp("1899-12-30") + pd.Timedelta(x, unit="days")
)
# Create a timestamp from the integer Period representing 30 minute interval
df["date_time"] = df["date"] + pd.to_timedelta((df["Period"] - 1) * 30, unit="m")
df.dropna(inplace=True)
# Rename columns
df = df[["date_time", "OperationalLessIndustrial"]]
df.columns = ["date_time", "demand"]
# Resample to hourly
df = df.set_index("date_time").resample("H").agg({"demand": "sum"})

split_idx = "2014-12-31 23:59:59"
X_train = df.loc[:split_idx]
X_test = df.loc[split_idx:]

model = Lasso()
forecaster = ForecasterAutoreg(
    regressor=model,  # the machine learning model
    lags=[1, 24, 7 * 24],  # the lag features to create
    forecaster_id="recursive",
)
forecaster.fit(y = X_train['demand'], store_in_sample_residuals=True)
forecaster.predict_interval(steps=5)

Python: 3.11.5 Numpy: 1.26.2 Pandas: 2.2.2 skforecast: 0.12.1 skearn: 1.4.2

pred lower_bound upper_bound
2015-01-01 00:00:00 8067.17 7520.26 8793.51
2015-01-01 01:00:00 7670.03 6843.66 8574.82
2015-01-01 02:00:00 7246.65 6385.37 8315.82
2015-01-01 03:00:00 6884.44 6016.19 7927.14
2015-01-01 04:00:00 6656.33 5758.76 7559.52

If not, please send us the notebook you mentioned and the version of python/numpy/pandas/skforecast you're using.

Thanks for opening the issue! 😄