Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.9k stars 332 forks source link

how to use continuous exogenous variable in the future for forecasting problem #217

Closed ramdhan1989 closed 1 year ago

ramdhan1989 commented 2 years ago

Hi, I need to forecast a target variable and I have two time series variable as continuous exog variables that can be used to forecast target series for multistep ahead. I can access the future values of exog variable and for production purpose I would like to simulate the impact of exog variable in the future to the forecasted target. So, I need to use exog at time t+1 to forecast target at time t+1 and so on. can I do that?

thank you

kdgutier commented 2 years ago

Hi @ramdhan1989,

You can feed the future exogenous variables, with the X_df and specify them with f_cols that identify them as future available. I recommend you to use the N-BEATSx model and for your simulation experiments you can its .forecast method.

We do something similar to your needs here:

If you have more questions, feel free to join our slack channel.

Slack.

ramdhan1989 commented 2 years ago

Hi @kdgutier , thanks for your answer. If I only have one unique id, how to specify S_df? I tried to make S_df as follow : image I got an error message below (I am not sure what the problem is). how to solve it ?

AssertionError                            Traceback (most recent call last)
Input In [32], in <cell line: 61>()
     57 mc['n_layers'] =  len(mc['stack_types']) * [ mc['constant_n_layers'] ]
     59 from neuralforecast.experiments.utils import create_datasets
---> 61 train_dataset, val_dataset, test_dataset, scaler_y = create_datasets(mc=mc,
     62                                                                      S_df=s_df, Y_df=y_df, X_df=x_df,
     63                                                                      f_cols=['Exogenous1', 'Exogenous2'],
     64                                                                      ds_in_val=180,
     65                                                                      ds_in_test=984)
     67 train_loader = TimeSeriesLoader(dataset=train_dataset,
     68                                 batch_size=int(mc['batch_size']),
     69                                 n_windows=mc['n_windows'],
     70                                 shuffle=True)
     72 val_loader = TimeSeriesLoader(dataset=val_dataset,
     73                               batch_size=int(mc['batch_size']),
     74                               shuffle=False)

File ~\Anaconda3\envs\SiT\lib\site-packages\neuralforecast\experiments\utils.py:249, in create_datasets(mc, S_df, Y_df, X_df, f_cols, ds_in_test, ds_in_val, verbose)
    244 train_mask_df, valid_mask_df, test_mask_df = get_mask_dfs(Y_df=Y_df,
    245                                                           ds_in_val=ds_in_val,
    246                                                           ds_in_test=ds_in_test)
    248 #---------------------------------------------- Scale Data ----------------------------------------------#
--> 249 Y_df, X_df, scaler_y = scale_data(Y_df=Y_df, X_df=X_df, mask_df=train_mask_df,
    250                                   normalizer_y=mc['normalizer_y'], normalizer_x=mc['normalizer_x'])
    252 #----------------------------------------- Declare Dataset and Loaders ----------------------------------#
    254 if mc['mode'] == 'simple':

File ~\Anaconda3\envs\SiT\lib\site-packages\neuralforecast\experiments\utils.py:202, in scale_data(Y_df, X_df, mask_df, normalizer_y, normalizer_x)
    200     for col in X_cols:
    201         scaler_x = Scaler(normalizer=normalizer_x)
--> 202         X_df[col] = scaler_x.scale(x=X_df[col].values, mask=mask)
    204 return Y_df, X_df, scaler_y

File ~\Anaconda3\envs\SiT\lib\site-packages\neuralforecast\data\scalers.py:43, in Scaler.scale(self, x, mask)
     40 elif self.normalizer == 'norm1':
     41     x_scaled, x_shift, x_scale = norm1_scaler(x, mask)
---> 43 assert len(x[mask==1] == np.sum(mask)), 'Something weird is happening, call Cristian'
     44 nan_before_scale = np.sum(np.isnan(x))
     45 nan_after_scale = np.sum(np.isnan(x_scaled))

AssertionError: Something weird is happening, call Cristian

thank you

kdgutier commented 2 years ago

There is not much reason to use S_df with a single series. Set it S_df = None, I recommend you to join the slack channel for these questions.

ramdhan1989 commented 2 years ago

Ok, I put my question on slack already. set to None still not working.

cchallu commented 2 years ago

Hi @ramdhan1989,

We released the fix to the scalers on March 28th. The input data must not have any NaN value, as it will propagate to all the data.

ramdhan1989 commented 2 years ago

Hi, still related with this problem. Now, I got error during the forecast. I followed this notebook to use nbeatsx

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [98], in <cell line: 2>()
      1 model.return_decomposition = False
----> 2 forecast_df = model.forecast(Y_df=Y_forecast_df, X_df=X_forecast_df, S_df=S_df, batch_size=2)
      3 forecast_df

Input In [10], in forecast(self, Y_df, X_df, S_df, batch_size, trainer, verbose)
     39 Y_df = Y_df.append(forecast_df).sort_values(['unique_id','ds']).reset_index(drop=True)
     41 # Dataset, loader and trainer
---> 42 dataset = WindowsDataset(S_df=S_df, Y_df=Y_df, X_df=X_df,
     43                             mask_df=None, f_cols=[],
     44                             input_size=self.n_time_in,
     45                             output_size=self.n_time_out,
     46                             sample_freq=1,
     47                             complete_windows=True,
     48                             ds_in_test=self.n_time_out,
     49                             is_test=True,
     50                             verbose=verbose)
     52 loader = TimeSeriesLoader(dataset=dataset,
     53                             batch_size=batch_size,
     54                             shuffle=False)
     56 if trainer is None:

File ~\Anaconda3\envs\SiT\lib\site-packages\neuralforecast\data\tsdataset.py:636, in WindowsDataset.__init__(self, Y_df, input_size, output_size, X_df, S_df, f_cols, mask_df, ds_in_test, is_test, sample_freq, complete_windows, last_window, verbose)
    590 def __init__(self,
    591              Y_df: pd.DataFrame,
    592              input_size: int,
   (...)
    602              last_window: bool = False,
    603              verbose: bool = False) -> 'TimeSeriesDataset':
    604     """
    605     Parameters
    606     ----------
   (...)
    634         Wheter or not log outputs.
    635     """
--> 636     super(WindowsDataset, self).__init__(Y_df=Y_df, input_size=input_size,
    637                                          output_size=output_size,
    638                                          X_df=X_df, S_df=S_df, f_cols=f_cols,
    639                                          mask_df=mask_df, ds_in_test=ds_in_test,
    640                                          is_test=is_test, complete_windows=complete_windows,
    641                                          verbose=verbose)
    642     # WindowsDataset parameters
    643     self.windows_size = self.input_size + self.output_size

File ~\Anaconda3\envs\SiT\lib\site-packages\neuralforecast\data\tsdataset.py:110, in BaseDataset.__init__(self, Y_df, X_df, S_df, f_cols, mask_df, ds_in_test, is_test, input_size, output_size, complete_windows, verbose)
    106     dataset_info += f'Outsample percentage={out_prc}, \t{n_out} time stamps \n'
    107     logging.info(dataset_info)
    109 self.ts_data, self.s_matrix, self.meta_data, self.t_cols, self.s_cols \
--> 110                  = self._df_to_lists(Y_df=Y_df, S_df=S_df, X_df=X_df, mask_df=mask_df)
    112 # Dataset attributes
    113 self.n_series = len(self.ts_data)

File ~\Anaconda3\envs\SiT\lib\site-packages\neuralforecast\data\tsdataset.py:201, in _df_to_lists(self, S_df, Y_df, X_df, mask_df)
    198 M = mask_df.sort_values(by=['unique_id', 'ds'], ignore_index=True).copy()
    200 assert np.array_equal(X.unique_id.values, Y.unique_id.values), f'Mismatch in X, Y unique_ids'
--> 201 assert np.array_equal(X.ds.values, Y.ds.values), f'Mismatch in X, Y ds'
    202 assert np.array_equal(M.unique_id.values, Y.unique_id.values), f'Mismatch in M, Y unique_ids'
    203 assert np.array_equal(M.ds.values, Y.ds.values), f'Mismatch in M, Y ds'

AssertionError: Mismatch in X, Y ds

any idea to solve it?