I am using MLForecast to perform LinearRegression as a baseline for multiple time series separated out by unique_id.
I get different forecasts depending on whether other unique_ids are submitted into the fit procedure.
The expected result is that two different unique_id's wouldn't influence each other in the prediction.
Versions / Dependencies
MLForecast ~ 0.12.0
Python ~ 3.9.4
Reproduction script
import pandas as pd
from pandas import Timestamp
from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression
from datetime import datetime
from pathlib import Path
fcst = MLForecast(models=[LinearRegression()], freq="MS", date_features=["month", "year"])
# Data ~
unique_ids = ['A','A','A','A','A','A','A','A','A','A','A','A',
'A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B']
dates = [Timestamp('2019-12-01 00:00:00'),
Timestamp('2020-01-01 00:00:00'),
Timestamp('2020-02-01 00:00:00'),
Timestamp('2020-03-01 00:00:00'),
Timestamp('2020-04-01 00:00:00'),
Timestamp('2020-05-01 00:00:00'),
Timestamp('2020-06-01 00:00:00'),
Timestamp('2020-07-01 00:00:00'),
Timestamp('2020-08-01 00:00:00'),
Timestamp('2020-09-01 00:00:00'),
Timestamp('2020-10-01 00:00:00'),
Timestamp('2020-11-01 00:00:00'),
Timestamp('2019-12-01 00:00:00'),
Timestamp('2020-01-01 00:00:00'),
Timestamp('2020-02-01 00:00:00'),
Timestamp('2020-03-01 00:00:00'),
Timestamp('2020-04-01 00:00:00'),
Timestamp('2020-05-01 00:00:00'),
Timestamp('2020-06-01 00:00:00'),
Timestamp('2020-07-01 00:00:00'),
Timestamp('2020-08-01 00:00:00'),
Timestamp('2020-09-01 00:00:00'),
Timestamp('2020-10-01 00:00:00'),
Timestamp('2020-11-01 00:00:00')]
ys = [13093.242657000004,
12827.4263068,
13048.2288898,
12999.3904056,
12407.4675967,
12186.6147398,
12387.8234062,
12475.222764500002,
12572.449667,
12582.9628866,
12756.9367475,
12909.047607200002,
234.6039148,
232.3045001,
238.5529862,
239.0891411,
231.2071973,
228.6472059,
234.5778714,
234.25102150000004,
237.792241,
238.4996353,
244.8442076,
244.30907230000005]
data = {'unique_id':unique_ids,
'ds':dates,
'y': ys,
}
df = pd.DataFrame(data)
fcst.fit(df)
Y_hat_df = fcst.predict(24)
# Predictions Way Off
print("Actual:\n", df.query("unique_id == 'A/B'").tail(5), "\n")
print("Predicted:\n", Y_hat_df.query("unique_id == 'A/B'").head(5), "\n")
fcst.fit(df.query("unique_id == 'A/B'"))
Y_hat_df = fcst.predict(24)
# Predictions when run with one id is fine
print("Actual:\n", df.query("unique_id == 'A/B'").tail(5), "\n")
print("Predicted:\n", Y_hat_df.query("unique_id == 'A/B'").head(5), "\n")
What happened + What you expected to happen
I am using MLForecast to perform LinearRegression as a baseline for multiple time series separated out by unique_id. I get different forecasts depending on whether other unique_ids are submitted into the fit procedure.
The expected result is that two different unique_id's wouldn't influence each other in the prediction.
Versions / Dependencies
MLForecast ~ 0.12.0 Python ~ 3.9.4
Reproduction script
Issue Severity
High: It blocks me from completing my task.