Closed CMobley7 closed 4 years ago
DL based methods can handle non-stationary, multivariate time-series with missing values and categorical features. In multivariate case, the target is at least 2 dimensional, where one dim is the number of variates (number of time-series). Normally, when you use any of the provided *Estimators like DeepVAREstimator
the requirements will be checked and the builtin transformations will create the required features automatically.
Note that in multivariate case, you can use MultivariateGrouper
to group the target into 2 dim, like
from gluonts.dataset.artificial import constant_dataset
from gluonts.dataset.common import TrainDatasets
from gluonts.dataset.multivariate_grouper import MultivariateGrouper
def load_multivariate_constant_dataset():
metadata, train_ds, test_ds = constant_dataset()
grouper_train = MultivariateGrouper(max_target_dim=10)
grouper_test = MultivariateGrouper(max_target_dim=10)
return TrainDatasets(
metadata=metadata,
train=grouper_train(train_ds),
test=grouper_test(test_ds),
)
dataset = load_multivariate_constant_dataset()
Hi @CMobley7,
as @ehsanmok wrote already, you can use the MultivariateGrouper
to convert any univariate time series dataset into multivariate time series.
Which model is the right one for your task depends. If you know the values of your related time series in the future (because they are time series indicators of holidays or known promotion), using these as dynamic features in univariate models (like DeepAR) does a fine job.
If this is not the case, I would recommend using gpvar
, as this is the multivariate time series model so far for which we have the most empirical evidence that it works well (see this paper).
Hope that heps.
@ehsanmok and @mbohlkeschneider , thank you for your advice thus far, I really appreciate. Unfortunately, I’m still slightly confused regarding which model I should choose and consequently how to create the training and test sets.
I planned to recreate the following notebook, but instead of using straight gluon or keras, I'd used gluon-ts. The author creates a model to forecast pollution given previous pollution, as well as other factors like rain, wind speed and temperature. So, which models do you think best fits this type of data and what is the best method to take the dataframe in cell 8 and turn it into both a training and test set given the model chosen. In addition, while dealing with non-stationarity may not be a problem with the DL based approaches in gluon-ts, I’m assuming scaling the features still is. Are there methods inside gluon-ts to deal with this or should I just use scikit-learn or similar library to do this prior to creating the dataframe in cell 8? Thank you again in advance.
@ehsanmok and @mbohlkeschneider, I've looked through gluon-ts's extended tutorial and understand how to make a traditional
dataset, but I'm still not sure exactly how to create a dataset for gpvar
. The MultivariateGrouper
is only useful for converting univariate datasets to multivariate, right? After looking at gpvar, it seems like it won't use any feature beside target. It looks like I need to group the target
and all features
into the target
field or am I mistaken and I should use the traditional dataset with the target in the target
field and all features in their appropriate fields (feat_static_cat
, feat_static_real
, feat_dynamic_cat
, feat_dynamic_real
)?
Hi @CMobley7 ,
you are correct this is what the MultivariateGrouper
is doing. Essentially, multivariate time series should have target fields that look like this. Then, the data should be loadable with our standard loaders.
You are right that GPVar
is not using additional features atm. Let me breakdown why:
feat_static_cat
: In our paper, we addressed the use-case of having a single multivariate dataset with shape (time, dim). Thus, the concept of a feat_static_cat
(which is a way to mark different time series) does not make sense because every time series is "the same".
feat_static_real
: We have not looked into this in the paper, but this could be implemented.
feat_dynamic_cat
: Currently, I think GluonTS provides the functionality to pass feat_dynamic_cat
to models but no model is using this so far. Feel free to experiment and share your findings!
feat_dynamic_real
: We have not looked into this in the paper. This could be quite challenging depending on how your data looks like. The two cases are:
Dynamic features are the same for all (marginal) time series: This is the case we have in for using our standard time features here. We don't really have the infrastructure from loading the data from files atm, I think. This case is straightforward.
Dynamic features are different for all (marginal) time series: This comes with a lot of practical issues: What values should features have if the time series are not same length (time series could be longer or shorter). Also, every feature introduced this way will add target_dim
inputs to the model, so my gut feeling is that this blows up fairly quickly and becomes hard to train.
I had the same issue that I had a dataset with feat_dynamic_real. Although gpvar and deepvar ignore feat_dynamic_real in principle, my trainings initially still crashed. I figured out that the root cause for this was the fact that the TrainingDataLoader would try to batch the feat_dynamic_real which, however, were not cut to the approriate length by InstanceSplitter in the default transformation. I fixed this by replacing the code in https://github.com/awslabs/gluon-ts/blob/master/src/gluonts/model/gpvar/_estimator.py#L253,
VstackFeatures(
output_field=FieldName.FEAT_TIME,
input_fields=[FieldName.FEAT_TIME],
)
by
VstackFeatures(
output_field=FieldName.FEAT_TIME,
input_fields=[FieldName.FEAT_TIME, FieldName.FEAT_DYNAMIC_REAL],
)
This works because VstackFeatures will by default drop the input_fields from the dataset. Maybe this is of help.
Thanks @mbohlkeschneider and @jaschau. Unfortunately, feature engineering is talking longer than I anticipated; so, it will probably be another week before I'm able to test gluon-ts with my dataset. I'll close this issue now since I believe all my question have been answered and post back later with results or potentially additional questions. Thanks again.
@mbohlkeschneider can any other model be used for multivariate series prediction or just gpvar?
Technically, DeepAR
and DeepVAR
should work as well. However, GPVAR
is the model I would recommend.
@mbohlkeschneider do you have an example notebook on how to make multivariate time series forecasting using gluon-ts?
@Pratik325, I don't have a notebook, but this test does show the setup. Let me know if you have questions.
@mbohlkeschneider I need to know how to use this on custom datasets. It would be beneficial if you performed on the simple dataset and shared the notebook because none of the platforms have any good explanation of gluon-ts for multivariate. It would help many of the learners. Thank you.
Hi @Pratik325,
Basically, the data preparation is the same as for all other models. The only difference is that the target field becomes a 2D array. So instead target=[1,2,3,4,5]
you would have target=[[1,2,3,4,5],[6,7,8,9,10]]
. Does this help?
No sir, @mbohlkeschneider
Hi @mbohlkeschneider , can you please help me..?
A complete example https://github.com/awslabs/gluon-ts/issues/382 can be found here. I am not sure it's entirely up to date but it sure demonstrates the basic setup.
@jaschau its outdated!!!
My apologies for the ignorant questions in advance, while I’m not necessarily new to deep learning, I’m a new fairly new to time series forecasting, especially when using deep learning techniques for it.
Due to the fact
gluon-ts
is making use ofDL based approaches
, dealing withnon-stationarity
in training datasets is not necessary, unlike when usingAR
/MA
andVAR
based models, correct? This appears to be outlined here.Also, I am working with a
multivariate time series dataset
in which thetarget/dependent variable
is related and/or dependent on otherfeatures/independent variables
. So, while I’m only trying to predict one target variable, the relationship between this target variable and the other features is important; consequently, this leads to two questions.First, since the relationship between the target variable and other features is important, are the most applicable models
deepvar
andgpvar
or will other models in gluon-ts work and I’m just thinking too much in terms of classical time series forecasting?Second, if I’m using
deepvar
orgpvar
, I’m assuming that when making the dataset, thetarget
should be a vector of vectors which include my target variable and the other features, right? However, if I’m thinking too much in terms of classical time series forecasting,target
should be a vector of the target variable and I should store the other features as vectors of vectors in eitherdynamic_feat
orcat
, right?Again, I’m sorry for my ignorance. Thanks in advance for any assistance you provide.