Multivariate time series forecasting question

CMobley7 commented 4 years ago

My apologies for the ignorant questions in advance, while I’m not necessarily new to deep learning, I’m a new fairly new to time series forecasting, especially when using deep learning techniques for it.

Due to the fact gluon-ts is making use of DL based approaches, dealing with non-stationarity in training datasets is not necessary, unlike when using AR/MA and VAR based models, correct? This appears to be outlined here.

Also, I am working with a multivariate time series dataset in which the target/dependent variable is related and/or dependent on other features/independent variables. So, while I’m only trying to predict one target variable, the relationship between this target variable and the other features is important; consequently, this leads to two questions.

First, since the relationship between the target variable and other features is important, are the most applicable models deepvar and gpvar or will other models in gluon-ts work and I’m just thinking too much in terms of classical time series forecasting?

Second, if I’m using deepvar or gpvar, I’m assuming that when making the dataset, the target should be a vector of vectors which include my target variable and the other features, right? However, if I’m thinking too much in terms of classical time series forecasting, target should be a vector of the target variable and I should store the other features as vectors of vectors in either dynamic_feat or cat, right?

Again, I’m sorry for my ignorance. Thanks in advance for any assistance you provide.

ehsanmok commented 4 years ago

DL based methods can handle non-stationary, multivariate time-series with missing values and categorical features. In multivariate case, the target is at least 2 dimensional, where one dim is the number of variates (number of time-series). Normally, when you use any of the provided *Estimators like DeepVAREstimator the requirements will be checked and the builtin transformations will create the required features automatically.

Note that in multivariate case, you can use MultivariateGrouper to group the target into 2 dim, like

from gluonts.dataset.artificial import constant_dataset
from gluonts.dataset.common import TrainDatasets
from gluonts.dataset.multivariate_grouper import MultivariateGrouper

def load_multivariate_constant_dataset():
    metadata, train_ds, test_ds = constant_dataset()
    grouper_train = MultivariateGrouper(max_target_dim=10)
    grouper_test = MultivariateGrouper(max_target_dim=10)
    return TrainDatasets(
        metadata=metadata,
        train=grouper_train(train_ds),
        test=grouper_test(test_ds),
    )

dataset = load_multivariate_constant_dataset()

mbohlkeschneider commented 4 years ago

Hi @CMobley7,

as @ehsanmok wrote already, you can use the MultivariateGrouper to convert any univariate time series dataset into multivariate time series.

Which model is the right one for your task depends. If you know the values of your related time series in the future (because they are time series indicators of holidays or known promotion), using these as dynamic features in univariate models (like DeepAR) does a fine job.

If this is not the case, I would recommend using gpvar, as this is the multivariate time series model so far for which we have the most empirical evidence that it works well (see this paper).

Hope that heps.

CMobley7 commented 4 years ago

@ehsanmok and @mbohlkeschneider , thank you for your advice thus far, I really appreciate. Unfortunately, I’m still slightly confused regarding which model I should choose and consequently how to create the training and test sets.

I planned to recreate the following notebook, but instead of using straight gluon or keras, I'd used gluon-ts. The author creates a model to forecast pollution given previous pollution, as well as other factors like rain, wind speed and temperature. So, which models do you think best fits this type of data and what is the best method to take the dataframe in cell 8 and turn it into both a training and test set given the model chosen. In addition, while dealing with non-stationarity may not be a problem with the DL based approaches in gluon-ts, I’m assuming scaling the features still is. Are there methods inside gluon-ts to deal with this or should I just use scikit-learn or similar library to do this prior to creating the dataframe in cell 8? Thank you again in advance.

CMobley7 commented 4 years ago

@ehsanmok and @mbohlkeschneider, I've looked through gluon-ts's extended tutorial and understand how to make a traditional dataset, but I'm still not sure exactly how to create a dataset for gpvar. The MultivariateGrouper is only useful for converting univariate datasets to multivariate, right? After looking at gpvar, it seems like it won't use any feature beside target. It looks like I need to group the target and all features into the target field or am I mistaken and I should use the traditional dataset with the target in the target field and all features in their appropriate fields (feat_static_cat, feat_static_real, feat_dynamic_cat, feat_dynamic_real)?

mbohlkeschneider commented 4 years ago

Hi @CMobley7 ,

you are correct this is what the MultivariateGrouper is doing. Essentially, multivariate time series should have target fields that look like this. Then, the data should be loadable with our standard loaders.

You are right that GPVar is not using additional features atm. Let me breakdown why:

feat_static_cat: In our paper, we addressed the use-case of having a single multivariate dataset with shape (time, dim). Thus, the concept of a feat_static_cat (which is a way to mark different time series) does not make sense because every time series is "the same".

feat_static_real: We have not looked into this in the paper, but this could be implemented.

feat_dynamic_cat: Currently, I think GluonTS provides the functionality to pass feat_dynamic_cat to models but no model is using this so far. Feel free to experiment and share your findings!

feat_dynamic_real: We have not looked into this in the paper. This could be quite challenging depending on how your data looks like. The two cases are:

Dynamic features are the same for all (marginal) time series: This is the case we have in for using our standard time features here. We don't really have the infrastructure from loading the data from files atm, I think. This case is straightforward.
Dynamic features are different for all (marginal) time series: This comes with a lot of practical issues: What values should features have if the time series are not same length (time series could be longer or shorter). Also, every feature introduced this way will add target_dim inputs to the model, so my gut feeling is that this blows up fairly quickly and becomes hard to train.

jaschau commented 4 years ago

I had the same issue that I had a dataset with feat_dynamic_real. Although gpvar and deepvar ignore feat_dynamic_real in principle, my trainings initially still crashed. I figured out that the root cause for this was the fact that the TrainingDataLoader would try to batch the feat_dynamic_real which, however, were not cut to the approriate length by InstanceSplitter in the default transformation. I fixed this by replacing the code in https://github.com/awslabs/gluon-ts/blob/master/src/gluonts/model/gpvar/_estimator.py#L253,

VstackFeatures(
    output_field=FieldName.FEAT_TIME,
    input_fields=[FieldName.FEAT_TIME],
)

by

VstackFeatures(
    output_field=FieldName.FEAT_TIME,
    input_fields=[FieldName.FEAT_TIME, FieldName.FEAT_DYNAMIC_REAL],
)

This works because VstackFeatures will by default drop the input_fields from the dataset. Maybe this is of help.

CMobley7 commented 4 years ago

Thanks @mbohlkeschneider and @jaschau. Unfortunately, feature engineering is talking longer than I anticipated; so, it will probably be another week before I'm able to test gluon-ts with my dataset. I'll close this issue now since I believe all my question have been answered and post back later with results or potentially additional questions. Thanks again.

vblagoje commented 4 years ago

@mbohlkeschneider can any other model be used for multivariate series prediction or just gpvar?

mbohlkeschneider commented 4 years ago

Technically, DeepAR and DeepVAR should work as well. However, GPVAR is the model I would recommend.