Nixtla / statsforecast

Lightning ⚡️ fast forecasting with statistical and econometric models.
https://nixtlaverse.nixtla.io/statsforecast
Apache License 2.0
3.92k stars 276 forks source link

Unexpected NaN values in rolling_std results causing row removal #451

Closed Arsa-Nik closed 1 year ago

Arsa-Nik commented 1 year ago

Description

Hi, I would like to share 2 observations when rolling_std is used within mlforecast: 1) It seems that rolling_std generates NAN when there is a set of consecutive zeros larger than window size of rolling_std. perhaps the correct answer in these cases should be zero instead of NAN. 2) The default value for dropna in fcst.preprocess() is True, so it automatically removes rows where there is NAN without warning.


import lightgbm as lgb
import pandas as pd
import numpy as np
from mlforecast import MLForecast
from window_ops.rolling import rolling_std

data = pd.DataFrame({
    'date': pd.date_range(start='2019-01-01', end='2020-12-31', freq='MS'),
    'sprid': 1.,
    'target': [1., 2., 0., 4., 0., 0., 0., 0., 9., 10., 11., 12.] * 2
})

models = [lgb.LGBMRegressor(**{})]
fcst = MLForecast(
    models=models,
    freq='MS',
    lags=[1],
    lag_transforms={
        1: [(rolling_std, 3)]
    }
)

preprocessed_df = fcst.preprocess(data, id_col='sprid', time_col='date', target_col='target', dropna=False)
print(preprocessed_df)

## check _rolling_std
from window_ops.rolling import  _rolling_std
a = np.array([1, 2, 0, 4, 0, 0, 0, 0, 9, 10, 11, 12] * 2)
print(_rolling_std(a, 3))

### Use case

_No response_
AzulGarza commented 1 year ago

hey @Arsa-Nik! Thank you for letting us know about the issue. This problem is related to MLForecast; could you open the issue in that repo? Here's the link:

Thank you!

Arsa-Nik commented 1 year ago

Sure, I was notified the issue is resolved so I close this. Thanks.