Closed ericjcampbellphd closed 2 years ago
I should also note that I have 3325 different time series, with the smallest time series containing 4 points (the train version) and the largest time series containing 35 points.
Some debugging I just did showed if I do not include feat_dynamic_real or feat_dynamic_cat in the time series, I do not get an error.
Hi @ericjcampbellphd just to understand better: the part in your snippet where the transformation is set is not used to compute in any way, right? If I’m not wrong, it should be sufficient to set up the estimator, train it, and get the predictions, to run into the issue
Hi @lostella.
I train using train_ds
which contains the entries mentioned above, not on train_df
which would be the explicitly transformed dataset.
transformation = create_transformation('M', context_length, prediction_length)
train_tf = transformation(iter(train_ds), is_train=True)
As you surmised, I create the estimator and train, but get an error generating the forecast predictions. The transformation is implicitly run in the estimator if create_transformation
is defined, right?
Hey @ericjcampbellphd to try to reproduce the issue, could you maybe explicitly set the prediction_length
and other settings for the estimator?
@lostella
Same issue with explicitly setting the parameters. Also how do you create a code block? The insert code option I'm using does not look too good.
Does it matter that each time series has different lengths? The transformer should unify every time series into a fixed window right? If I restrict the train set to only series with the same width, I don't have any error.
estimator = SimpleFeedForwardEstimator(
num_hidden_dimensions=[10],
prediction_length=4,
context_length=4,
freq='M',
trainer=Trainer(
ctx="cpu",
epochs=1,
learning_rate=1e-3,
hybridize=False,
batch_size=50,
num_batches_per_epoch=250
)
)
ValueError Traceback (most recent call last)
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\mxnet\ndarray\ndarray.py in array(source_array, ctx, dtype)
3358 try:
-> 3359 source_array = np.array(source_array, dtype=dtype)
3360 except:
ValueError: could not broadcast input array from shape (18,22) into shape (18)
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-202-5b3055edfe1b> in <module>
----> 1 list(forecast_it)
Also how do you create a code block? The insert code option I'm using does not look too good.
You need to use triple backticks to open/close a block
``` like this ```
@lostella
I created a minimum reproducible example below, but I think I figured out the reason for my error. Correct me if I am wrong, but do feat_dynamic_real
and feat_dynamic_cat
need to have the same shape as the largest time series in the ListDataset, not the same shape as their corresponding target?
Here is the code to produce the error:
import mxnet
import gluonts
import pandas as pd
import numpy as np
from gluonts.dataset.common import ListDataset
from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.mx.trainer import Trainer
from gluonts.evaluation import make_evaluation_predictions
context_length=4
prediction_length=4
target = []
start = []
feat_static_cat = []
feat_dynamic_real = []
target.append([1, 2, 3, 4, 5, 6, 7, 8]) # 8 elements
start.append(pd.Timestamp(year=2021, month=6, day=18, freq='M'))
feat_static_cat.append([1])
feat_dynamic_real.append(np.array([[1, 1, 1, 1, 1, 1, 1, 1], # 2 by 8 array
[1, 1, 1, 1, 1, 1, 1, 1]]))
target.append([8, 12, 13, 14, 15, 16]) # 6 elements
start.append(pd.Timestamp(year=2021, month=7, day=18, freq='M'))
feat_static_cat.append([0])
feat_dynamic_real.append(np.array([[1, 1, 1, 1, 1, 1], # 2 by 6 array
[1, 1, 1, 1, 1, 1]]))
train_ds = ListDataset([{'target': t[:-prediction_length], 'start': s, 'feat_static_cat': fsc,
'feat_dynamic_real': fdr[:, :-prediction_length]}
for t, s, fsc, fdr in zip(target, start, feat_static_cat, feat_dynamic_real)], freq='M')
test_ds = ListDataset([{'target': t, 'start': s, 'feat_static_cat': fsc, 'feat_dynamic_real': fdr}
for t, s, fsc, fdr in zip(target, start, feat_static_cat, feat_dynamic_real)], freq='M')
estimator = SimpleFeedForwardEstimator(
num_hidden_dimensions=[10],
prediction_length=prediction_length,
context_length=context_length,
freq='M',
trainer=Trainer(
ctx="cpu",
epochs=1,
learning_rate=1e-3,
hybridize=False,
batch_size=2,
num_batches_per_epoch=1
)
)
predictor = estimator.train(train_ds)
forecast_it, ts_it = make_evaluation_predictions(
dataset=test_ds,
predictor=predictor,
num_samples=100,
)
list(forecast_it)
This generates the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\mxnet\ndarray\ndarray.py in array(source_array, ctx, dtype)
3358 try:
-> 3359 source_array = np.array(source_array, dtype=dtype)
3360 except:
ValueError: could not broadcast input array from shape (2,8) into shape (2)
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-122-5b3055edfe1b> in <module>
----> 1 list(forecast_it)
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\gluonts\mx\model\predictor.py in predict(self, dataset, num_samples, num_workers, num_prefetch, **kwargs)
163 )
164 with mx.Context(self.ctx):
--> 165 yield from self.forecast_generator(
166 inference_data_loader=inference_data_loader,
167 prediction_net=self.prediction_net,
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\gluonts\model\forecast_generator.py in __call__(self, inference_data_loader, prediction_net, input_names, freq, output_transform, num_samples, **kwargs)
170 **kwargs
171 ) -> Iterator[Forecast]:
--> 172 for batch in inference_data_loader:
173 inputs = [batch[k] for k in input_names]
174 outputs = predict_to_numpy(prediction_net, inputs)
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\gluonts\mx\batchify.py in batchify(data, ctx, dtype, variable_length)
85 variable_length: bool = False,
86 ) -> DataBatch:
---> 87 return {
88 key: stack(
89 data=[item[key] for item in data],
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\gluonts\mx\batchify.py in <dictcomp>(.0)
86 ) -> DataBatch:
87 return {
---> 88 key: stack(
89 data=[item[key] for item in data],
90 ctx=ctx,
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\gluonts\mx\batchify.py in stack(data, ctx, dtype, variable_length)
73 return mx.nd.stack(*data)
74 if isinstance(data[0], np.ndarray):
---> 75 data = mx.nd.array(data, dtype=dtype, ctx=ctx)
76 elif isinstance(data[0], (list, tuple)):
77 return list(stack(t, ctx=ctx) for t in zip(*data))
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\mxnet\ndarray\utils.py in array(source_array, ctx, dtype)
144 return _sparse_array(source_array, ctx=ctx, dtype=dtype)
145 else:
--> 146 return _array(source_array, ctx=ctx, dtype=dtype)
147
148
~\Documents\Projects\XRR\Forecasting\.mxnet\lib\site-packages\mxnet\ndarray\ndarray.py in array(source_array, ctx, dtype)
3359 source_array = np.array(source_array, dtype=dtype)
3360 except:
-> 3361 raise TypeError('source_array must be array like object')
3362
3363 if source_array.shape == ():
TypeError: source_array must be array like object
But if I edit the code to create the dataset as below, no error occurs:
target = []
start = []
feat_static_cat = []
feat_dynamic_real = []
target.append([1, 2, 3, 4, 5, 6, 7, 8]) #8 elements
start.append(pd.Timestamp(year=2021, month=6, day=18, freq='M'))
feat_static_cat.append([1])
feat_dynamic_real.append(np.array([[1, 1, 1, 1, 1, 1, 1, 1], #2 by 8 array
[1, 1, 1, 1, 1, 1, 1, 1]]))
target.append([8, 12, 13, 14, 15, 16]) # 6 elements
start.append(pd.Timestamp(year=2021, month=7, day=18, freq='M'))
feat_static_cat.append([0])
feat_dynamic_real.append(np.array([[1, 1, 1, 1, 1, 1, 1, 1], # 2 by 8 array
[1, 1, 1, 1, 1, 1, 1, 1]]))
@ericjcampbellphd for training data, dynamic features should have the same time length as the corresponding target, so your example should be fine wrt that. I'm looking into it, will let you know what I can find
Re-factored a bit the minimal working example, to allow playing with data fields and shapes more easily. The SimpleFeedForwardEstimator
doesn't use additional features internally (only the target
) and in fact everything works fine when removing additional fields from the data. So the problem is that additional fields are not filtered out (when making predictions; training works fine) before batching the data that goes into the network.
@ericjcampbellphd if you intend to keep experimenting with SimpleFeedForwardEstimator
you can simply discard additional fields other than target
. Other models that do use additional features, like DeepAREstimator
, appear to be working fine with your example.
Thanks for submitting the issue! Let's keep this open until a fix is there for SimpleFeedForwardEstimator
.
import pandas as pd
import numpy as np
from gluonts.dataset.common import ListDataset
from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.mx.trainer import Trainer
from gluonts.evaluation import make_evaluation_predictions
context_length=4
prediction_length=4
data_length_0 = 20
data_length_1 = 18
train_ds = ListDataset(
[
{
'start': pd.Timestamp(year=2021, month=6, day=18, freq='M'),
'target': np.ones(shape=(data_length_0 - prediction_length,)),
'feat_static_cat': [0],
'feat_dynamic_real': np.ones(shape=(2, data_length_0 - prediction_length,))
},
{
'start': pd.Timestamp(year=2021, month=7, day=18, freq='M'),
'target': np.ones(shape=(data_length_1 - prediction_length,)),
'feat_static_cat': [1],
'feat_dynamic_real': np.ones(shape=(2, data_length_1 - prediction_length,))
}
],
freq='M'
)
estimator = SimpleFeedForwardEstimator(
num_hidden_dimensions=[10],
prediction_length=prediction_length,
context_length=context_length,
freq='M',
trainer=Trainer(
ctx="cpu",
epochs=1,
learning_rate=1e-3,
hybridize=False,
batch_size=2,
num_batches_per_epoch=1
)
)
predictor = estimator.train(train_ds)
test_ds = ListDataset(
[
{
'start': pd.Timestamp(year=2021, month=6, day=18, freq='M'),
'target': np.ones(shape=(data_length_0,)),
'feat_static_cat': [0],
'feat_dynamic_real': np.ones(shape=(2, data_length_0,))
},
{
'start': pd.Timestamp(year=2021, month=7, day=18, freq='M'),
'target': np.ones(shape=(data_length_1,)),
'feat_static_cat': [1],
'feat_dynamic_real': np.ones(shape=(2, data_length_1,))
}
],
freq='M'
)
forecast_it, ts_it = make_evaluation_predictions(
dataset=test_ds,
predictor=predictor,
num_samples=100,
)
list(forecast_it)
Thanks for the help!
Hi. Apologies if this has been encountered before or I am making some silly mistake since I am new to Gluonts. I adapted the extended forecasting tutorial code to fit my purpose, though the make_evaluation_predictions function call produces a generator object which cannot be casted to a list like in the tutorial.
mxnet version 1.7.0 gluonts version 0.7.0
Here is my code:
context and prediction length are set to 4 I can't paste my data because it is confidential, but here is what the train dataset contains:
Using
list(ts_it)
works fine, but when I uselist(forecast_it)
I get this error:If I create a variable
sample = iter(forecast_it)
and then run'next(sample)
repeatedly, on the 33rd time I get this error:I include this because the error mentions
could not broadcast input array from shape (18,22) into shape (18)
and my feat_dynamic_cat array is that shape.Any ideas?