Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.91k stars 333 forks source link

Own dataset form multivariate regression task #1068

Closed IKetchup closed 2 weeks ago

IKetchup commented 1 month ago

What happened + What you expected to happen

Hello, thank you for your amazing work. I would like to use it in my research with timeseries. I woud like to train a model to predict 3D timeseries from other differents 3D timeseries. My goal is to use timeseries of shape (batch_size, len_size, input_features) to predict differents timeseries of shape (batch_size, len_size, output_features) with input_features different from output_features (no commun features). Is it possible to do so as this type of problem is not exaclty seen as forcasting and if possible how should I start ? Thanks in advance

Versions / Dependencies

.

Reproduction script

.

Issue Severity

Low: It annoys or frustrates me.

elephaint commented 1 month ago

THanks for the kind words.

Technically NF should be able to handle this scenario but I don't think we currently have an algorithm that fully supports what you're trying to do. If I understand correctly, you want to use some set of features to predict another set of features, where both sets of features may or may not consist (partially?) of timeseries. If none of the features contain timeseries, obviously the best choice is to just treat this as a regression problem with multiple outputs. Then, you're better of not treating this as a forecasting problem.

Now, if you do have timeseries that you want to forecast (the output features are timeseries values) you could implement this yourself in a model in NF; it needs to be a multivariate model (it consumes multiple timeseries / features and outputs multiple timeseries/features). However, since you want the input features/series different from the output features/series, you'd have to exclude the input series from the algorithm, and use a set of different exogenous features as input features. These input features would then serve to predict the output series. The issue is that we don't offer such a model at the moment - you'd need a multivariate model where you can also exclude the insample_y (i.e. the lagged target variables) in order to create a distinction between input- and output features. So, your best bet using a custom model in NF would be to:

  1. Follow our guide on adding a model.
  2. Start with a copy of TSMixerx.
  3. Remove the insample_y part of TSMixerx.
  4. Now, when training, create a dataset that contains exogenous features and time series targets. By having removed the insample_y part, the algorithm will not consider any (lagged) target variables in the training/prediction. This is what you want. It will only use the exogenous variables during training, and these variables can be completely distinct / not overlapping from the target. The target will be the time series targets. I think with this setup you will achieve what you want and at this moment it's the easiest way to do it with NF until we offer a multivariate model that allows to exclude insample_y by default (some of our univariate models do offer this functionality already, for example NHITS allows to set a hyperparameter exclude_insample_y = True, which will cause the lagged target variable to be ignored during training).

Hope this is a bit clear and let me know if this answers your question.

github-actions[bot] commented 2 weeks ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.