Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
858 stars 84 forks source link

MLForecast LinearRegression Isn't Applied to Each Unique Id Time Series Seperately #329

Closed negodfre closed 6 months ago

negodfre commented 6 months ago

What happened + What you expected to happen

I am using MLForecast to perform LinearRegression as a baseline for multiple time series separated out by unique_id. I get different forecasts depending on whether other unique_ids are submitted into the fit procedure.

The expected result is that two different unique_id's wouldn't influence each other in the prediction.

Versions / Dependencies

MLForecast ~ 0.12.0 Python ~ 3.9.4

Reproduction script

import pandas as pd
from pandas import Timestamp
from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

from datetime import datetime
from pathlib import Path

fcst = MLForecast(models=[LinearRegression()], freq="MS", date_features=["month", "year"])

# Data ~
unique_ids = ['A','A','A','A','A','A','A','A','A','A','A','A',
 'A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B','A/B']

dates = [Timestamp('2019-12-01 00:00:00'),
 Timestamp('2020-01-01 00:00:00'),
 Timestamp('2020-02-01 00:00:00'),
 Timestamp('2020-03-01 00:00:00'),
 Timestamp('2020-04-01 00:00:00'),
 Timestamp('2020-05-01 00:00:00'),
 Timestamp('2020-06-01 00:00:00'),
 Timestamp('2020-07-01 00:00:00'),
 Timestamp('2020-08-01 00:00:00'),
 Timestamp('2020-09-01 00:00:00'),
 Timestamp('2020-10-01 00:00:00'),
 Timestamp('2020-11-01 00:00:00'),
 Timestamp('2019-12-01 00:00:00'),
 Timestamp('2020-01-01 00:00:00'),
 Timestamp('2020-02-01 00:00:00'),
 Timestamp('2020-03-01 00:00:00'),
 Timestamp('2020-04-01 00:00:00'),
 Timestamp('2020-05-01 00:00:00'),
 Timestamp('2020-06-01 00:00:00'),
 Timestamp('2020-07-01 00:00:00'),
 Timestamp('2020-08-01 00:00:00'),
 Timestamp('2020-09-01 00:00:00'),
 Timestamp('2020-10-01 00:00:00'),
 Timestamp('2020-11-01 00:00:00')]

ys = [13093.242657000004,
 12827.4263068,
 13048.2288898,
 12999.3904056,
 12407.4675967,
 12186.6147398,
 12387.8234062,
 12475.222764500002,
 12572.449667,
 12582.9628866,
 12756.9367475,
 12909.047607200002,
 234.6039148,
 232.3045001,
 238.5529862,
 239.0891411,
 231.2071973,
 228.6472059,
 234.5778714,
 234.25102150000004,
 237.792241,
 238.4996353,
 244.8442076,
 244.30907230000005] 

data = {'unique_id':unique_ids,
        'ds':dates,
        'y': ys,
       }
df = pd.DataFrame(data)

fcst.fit(df)
Y_hat_df = fcst.predict(24)

# Predictions Way Off
print("Actual:\n", df.query("unique_id == 'A/B'").tail(5), "\n")
print("Predicted:\n", Y_hat_df.query("unique_id == 'A/B'").head(5), "\n")

fcst.fit(df.query("unique_id == 'A/B'"))
Y_hat_df = fcst.predict(24)

# Predictions when run with one id is fine
print("Actual:\n", df.query("unique_id == 'A/B'").tail(5), "\n")
print("Predicted:\n", Y_hat_df.query("unique_id == 'A/B'").head(5), "\n")

Issue Severity

High: It blocks me from completing my task.

negodfre commented 6 months ago

I missed an already closed issue regarding something similar and mentions MLForecast always fits a global model.