awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.61k stars 752 forks source link

DeepVAR target_dim = 1 #2768

Open baniasbaabe opened 1 year ago

baniasbaabe commented 1 year ago

Description

I tried to use DeepVAREstimator with target_dim = 1. I get the following error:

Is it still somehow possible to use DeepVAR? I know about DeepAR, but DeepVAR has some nice properties.

Error message or code output

`gluonts.exceptions.GluonTSDataError: Input for field "target" does not have the requireddimension (field: target, ndim observed: 1, expected ndim: 2)`.
lostella commented 1 year ago

Hi, what is the layout of target you're providing? With DeepVAREstimator and target_dim=1 you should have something like

{
    "target": [[1.0, 2.0, 3.0, ...]],
}

Note the double square brackets, since target needs to be a 2-dimensional array with the first dimension being 1.

baniasbaabe commented 1 year ago

Hi, thanks for your kind answer. Now I get another error:

gluonts.exceptions.GluonTSDataError: Array 'target' has bad shape - expected 1 dimensions, got 2.

Here is an minimal example to reproduce:

import pandas as pd
from gluonts.mx.model.deepvar import DeepVAREstimator
from gluonts.mx import Trainer

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/00374/energydata_complete.csv')

df_input = df[['date','Appliances','T_out','Press_mm_hg','RH_out','Windspeed','Tdewpoint','Visibility']]
df_input = df_input.set_index('date')

train_time = "2016-05-10 00:00:00"
prediction_length = 100

estimator = DeepVAREstimator (freq="10min",
                             context_length=720, 
                             prediction_length=prediction_length,
                             target_dim=1,
                             num_layers=2, 
                             num_cells=128, 
                             cell_type='lstm', 
                             trainer=Trainer(epochs=3))

from gluonts.dataset.common import ListDataset

print([df_input.Appliances[:train_time].tolist()])

training_data = ListDataset(
     [{"start": df_input.index[0], "target": [df_input.Appliances[:train_time].tolist()]}],
     freq = "10min"
)

predictor = estimator.train(training_data=training_data)
lostella commented 1 year ago

Setting one_dim_target=False in ListDataset solves it, but then another issue occurs:

ValueError: Deferred initialization failed because shape cannot be inferred. MXNetError: Error in operator deepvartrainingnetwork0_broadcast_minimum1: [11:15:20] ../src/operator/numpy/../tensor/elemwise_binary_broadcast_op.h:67: Check failed: l == 1 || r == 1: operands could not be broadcast together with shapes [32,768,1,1] [32,768,1]

Debugging points to past_is_pad not having the right shape here: https://github.com/awslabs/gluonts/blob/160a127f64002322586359e2729c7b586acd15fb/src/gluonts/mx/model/deepvar/_network.py#L313

Will look into it

lostella commented 1 year ago

@baniasbaabe an easier way to set up the data using PandasDataset

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/00374/energydata_complete.csv",
    index_col="date",
    parse_dates=True,
)

train_time = "2016-05-10 00:00:00"

training_data = PandasDataset(df[:train_time], target=["Appliances"])

for entry in training_data:
    print(entry)

I think this is much nicer than ListDataset.

lostella commented 1 year ago

Reduced to the following

import pandas as pd
import numpy as np
from gluonts.mx.model.deepvar import DeepVAREstimator
from gluonts.mx import Trainer

TARGET_DIM = 1

training_data = [
    {
        "start": pd.Period("2012-03-04", freq="D"),
        "target": np.ones(shape=(TARGET_DIM, 400)),
    }
]

estimator = DeepVAREstimator(
    freq="D",
    context_length=123,
    prediction_length=45,
    target_dim=TARGET_DIM,
    num_layers=3,
    num_cells=10,
    cell_type="lstm",
    trainer=Trainer(epochs=3, hybridize=False),
)

predictor = estimator.train(training_data=training_data)

When setting TARGET_DIM = 2 the issue doesn't show up.

baniasbaabe commented 1 year ago

Setting one_dim_target=False in ListDataset solves it, but then another issue occurs:

ValueError: Deferred initialization failed because shape cannot be inferred. MXNetError: Error in operator deepvartrainingnetwork0_broadcast_minimum1: [11:15:20] ../src/operator/numpy/../tensor/elemwise_binary_broadcast_op.h:67: Check failed: l == 1 || r == 1: operands could not be broadcast together with shapes [32,768,1,1] [32,768,1]

Debugging points to past_is_pad not having the right shape here:

https://github.com/awslabs/gluonts/blob/160a127f64002322586359e2729c7b586acd15fb/src/gluonts/mx/model/deepvar/_network.py#L313

Will look into it

Do you know what the problem exactly is? I can also look into it

lostella commented 1 year ago

Do you know what the problem exactly is? I can also look into it

Not really, I couldn't look into it yet. My guess is there is some conditional behaviour on the target dimension being 1, that suppresses the axes. But some debugging would need to be done, any help is appreciated :-) I think the minimal example in my previous comment may be a good start

baniasbaabe commented 1 year ago

It's weird that GPVAR is working fine with TARGET_DIM=1:

A small snippet:


import pandas as pd
import numpy as np
from gluonts.mx.model.deepvar import DeepVAREstimator
from gluonts.mx import Trainer
from gluonts.mx.model.gpvar import GPVAREstimator

TARGET_DIM = 1

training_data = [
    {
        "start": pd.Period("2012-03-04", freq="D"),
        "target": np.ones(shape=(TARGET_DIM, 400)),
    }
]

estimator = GPVAREstimator(
    freq="D",
    context_length=123,
    prediction_length=45,
    target_dim=TARGET_DIM,
    trainer=Trainer(epochs=3, hybridize=False),
)

predictor = estimator.train(training_data=training_data)