Open jrodenbergrheem opened 3 weeks ago
Hi @jrodenbergrheem, Thanks for reporting this error. Could you share a reproducible example?
Sure, here is a code snippet! `If it is more helpful I can send you a notebook with it occuring!
`forecaster = ForecasterAutoreg(regressor=Ridge(alpha=1), lags = [1,2,3,4,5,6,7,8,9,10] + [78, 78*2,
],
transformer_y=StandardScaler(),
transformer_exog=exog_pipe_2,
weight_func=custom_weights)
forecaster.fit(y=data.loc[:end_train,'SUCTIONT'], exog=exog_df.loc[:end_train], store_in_sample_residuals=False )``
Hi @jrodenbergrheem For a reproducible example I also need the data. Could you try the same code with simulated data and check if you still see the same error?
Thanks a lot!
Hi @jrodenbergrheem,
We think this bug is related to an old version of Numpy. Could you check if this code works for you with Python 3.11?
import sys
import numpy as np
import pandas as pd
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.linear_model import Lasso
import skforecast
import sklearn
print(f"Python: {sys.version}")
print(f"Numpy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"skforecast: {skforecast.__version__}")
print(f"skearn: {sklearn.__version__}")
df = pd.read_csv(
"https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/demand.csv"
)
df.drop(columns=["Industrial"], inplace=True)
# Convert the integer Date to an actual date with datetime type
df["date"] = df["Date"].apply(
lambda x: pd.Timestamp("1899-12-30") + pd.Timedelta(x, unit="days")
)
# Create a timestamp from the integer Period representing 30 minute interval
df["date_time"] = df["date"] + pd.to_timedelta((df["Period"] - 1) * 30, unit="m")
df.dropna(inplace=True)
# Rename columns
df = df[["date_time", "OperationalLessIndustrial"]]
df.columns = ["date_time", "demand"]
# Resample to hourly
df = df.set_index("date_time").resample("H").agg({"demand": "sum"})
split_idx = "2014-12-31 23:59:59"
X_train = df.loc[:split_idx]
X_test = df.loc[split_idx:]
model = Lasso()
forecaster = ForecasterAutoreg(
regressor=model, # the machine learning model
lags=[1, 24, 7 * 24], # the lag features to create
forecaster_id="recursive",
)
forecaster.fit(y = X_train['demand'], store_in_sample_residuals=True)
forecaster.predict_interval(steps=5)
Python: 3.11.5 Numpy: 1.26.2 Pandas: 2.2.2 skforecast: 0.12.1 skearn: 1.4.2
pred | lower_bound | upper_bound | |
---|---|---|---|
2015-01-01 00:00:00 | 8067.17 | 7520.26 | 8793.51 |
2015-01-01 01:00:00 | 7670.03 | 6843.66 | 8574.82 |
2015-01-01 02:00:00 | 7246.65 | 6385.37 | 8315.82 |
2015-01-01 03:00:00 | 6884.44 | 6016.19 | 7927.14 |
2015-01-01 04:00:00 | 6656.33 | 5758.76 | 7559.52 |
If not, please send us the notebook you mentioned and the version of python/numpy/pandas/skforecast you're using.
Thanks for opening the issue! 😄
I am trying to calculate in-sample residuals and for some reason I get this error on all of my forecasters when not setting "store_insample_residuals" = FALSE.
Hopefully it's a small issue.
Thanks!