Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
789 stars 74 forks source link

Unbale to do LogTransformation using target_transformation #321

Closed obiii closed 4 months ago

obiii commented 4 months ago

What happened + What you expected to happen

from mlforecast.target_transforms import BaseTargetTransform
def plot(series, fname):
    n_series = len(series)
    fig, ax = plt.subplots(ncols=n_series, figsize=(7 * n_series, 6), squeeze=False)
    for (title, serie), axi in zip(series.items(), ax.flat):
        serie[1:300].set_index('ds')['y'].plot(title=title, ax=axi)
    fig.savefig(f'figs/{fname}', bbox_inches='tight')
    plt.close()

class LogTransformer(BaseTargetTransform):
    def fit_transform(self, df: pd.DataFrame) -> pd.DataFrame:
        df[self.target_col] = np.log1p(df[self.target_col])
        return df

    def inverse_transform(self, df: pd.DataFrame) -> pd.DataFrame:
        df[self.target_col] = np.expm1(df[self.target_col])
        return df

fcst = MLForecast(
    models=[],
    freq='D',
    target_transforms=[LogTransformer()],
)
logged = fcst.preprocess(swe_train_uam)
plot({'original': swe_train_uam, 'Log-transformed': logged}, 'target_transform.png')

I am trying to do Log (log1p) transformation on the target column. The above code works and gives me the plot.

But the same does work when performing cross-validation:

mlf = MLForecast(
    models = models, 
    freq='D',
    target_transforms=[LogTransformer()],
    lags=lags,
    lag_transforms=lag_transforms,
    date_features=date_features
)
crossvalidation_df = mlf.cross_validation(
    df=swe_train_uam,
    h=h,
    n_windows=3,
    refit=True,
)

gives the following trace: image

The code works if I remove the target_transforms from MLForecast.

Versions / Dependencies

Python: 3.9 MLForecast: 0.11.2

Reproduction script

Attached above.

Issue Severity

High: It blocks me from completing my task.

jmoralez commented 4 months ago

Hey @obiii, thanks for using mlforecast. The inverse transformation is applied to the predictions, not to the target, that's why we iterate over the columns that aren't the id, time or transformation stats here. So your inverse transformation should be something like this:

    def inverse_transform(self, df: pd.DataFrame) -> pd.DataFrame:
        df = df.copy(deep=False)
        for col in df.columns.drop([self.id_col, self.time_col]):
            df[col] = np.expm1(df[col])
        return df

That being said, if you're using a transformation that doesn't learn any parameters like the log here, you're better off using the GlobalSklearnTransformer (example).

Please let us know if you have further doubts.

obiii commented 4 months ago

Hi,

Thanks for the clarification.