Closed AaronSpieler closed 4 years ago
The zeros you are seeing in the target and the error you are getting might be unrelated.
By default, DeepAR also samples training instances from the beginning of the sequence, where the initial part of the sampled window can start before observations are available. This is controlled by the pick_incomplete
parameter of the InstanceSplitter
, which is true by default:
In this case, the initial portion (of the target and all time-varying features) is padded with zeros, which seems to be what you are observing here -- this is the intended behavior. However, while the loss function is evaluated on these padded zero values, they are removed from the final loss calculation:
The observed_values
feature is also zero-padded by the InstanceSplitter
, marking the padded target values as unobserved.
You could try modifying DeepAR to use pick_incomplete=False
to see if this fixes the problem (in which case there is a bug in the masking code somewhere).
Setting pick_inclomplete=False
fixes the problem. So as we can see from how the weighted_loss
is computed, the calculated loss is weighted with 0
for not observed values. If the loss is nan
, however, (because of log(0)
for example), then 0*nan=nan
and the weighted loss becomes nan
.
We should fix this, by setting the value of the padding dependent on the distribution: e.g. padding with 0.5
for Beta and Gamma instead of with 0
.
It’s maybe better to try improving the masking logic first: one way could be to use where
instead of multiplication, see https://mxnet.apache.org/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.where
It’s maybe better to try improving the masking logic first: one way could be to use
where
instead of multiplication, see https://mxnet.apache.org/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.where
Yeah, I absolutely agree, I just thought of that too omw home.
This is fixed by https://github.com/awslabs/gluon-ts/pull/534.
@AaronSpieler I was just going through this issue since I am faced with the following problem: I have a lot of time series with different lengths in the train set, so I was wondering if I need to pad them with 0 for them to be of the same length. Is that something that I need to do manually using FieldName.IS_PAD and/or FieldName.OBSERVED_VALUES or is that being taken care of automatically? Also, I don`t know how to use pick_incomplete.
Would greatly appreciate your help.
@StatMixedML no need to pad your time series: if pick_incomplete = True, the model will be trained by occasionally sampling training windows that partially fall outside (to the left) of your time series, and the initial missing data will be automatically padded; if pick_incomplete = False, then only training windows with genuine data will be sampled to form training batches, and no padding happens.
Hope that clarifies!
@lostella Thanks for clarifying. I am still not sure I entirely understand it. Assume that the prediction length = 18 months. However, most of the time series in the training set have length around 6-8 months, some longer, but all are irregularly spaced. For the DeepAR model I set prediction_length = context_length = 24. I padded the time series that have less than 24 months of observations to meet the prediction_length and context_length. All of them also have different starting times.
So you`d suggest NOT TO PAD the training data and to set context_length = min(train_length)? Or how would you tackle the problem of short and irregularly spaced time series? Sorry for the confusion. Also, how can I set pick_incomplete in the code?
Many thanks!
@AaronSpieler, your pull request was merged, but not released yet, right? I'm having possibly the same problem with DeepAR and Negative binomial distribution.
@AaronSpieler, your pull request was merged, but not released yet, right? I'm having possibly the same problem with DeepAR and Negative binomial distribution.
That's correct. It will be in the v0.5 release.
@lostella Thanks for clarifying. I am still not sure I entirely understand it. Assume that the prediction length = 18 months. However, most of the time series in the training set have length around 6-8 months, some longer, but all are irregularly spaced. For the DeepAR model I set _prediction_length = contextlength = 24. I padded the time series that have less than 24 months of observations to meet the _predictionlength and _contextlength. All of them also have different starting times.
So you`d suggest NOT TO PAD the training data and to set _context_length = min(trainlength)? Or how would you tackle the problem of short and irregularly spaced time series? Sorry for the confusion. Also, how can I set _pickincomplete in the code?
Many thanks!
The fact of irregularly spaced time-series could be a problem. You could try so pre-processing steps like filling the values with the average of the neighboring, or other values that make sense, to have regularly spaced time series in the end. Aggregation is another alternative.
Even bigger problem is if you want to have prediction_length=context_length=24
and your time series is not even 24 long, because you don't even have a proper target at that point. Or do you mean after removing the target 24 you don't have 24 anymore, because in that case you don't need to pad manually, just set pick_incomplete = True
as suggested by @lostella .
I checked in case of DeepAR, and it has pick_incomplete = True
so you don't need to do anything in that regard.
Description
When training a
DeepAREstimator
some of the input is randomly set to 0. This is especially apparent for distributions where0
is not in the support of the distribution, like for the Beta and Gamma distributions whose support is (0,1), and (0, inf) respectively.Even though my training and test data is in the interval [0.4, 0.6], the log probability gets calculated on data like this (printed for the student-t Distribution):
As we can see most data-points can be found in the generated sinusoidal (if one checks), except for the
0.
s. For distributions like Gamma or Beta this leads to numeric issues when calculating thelog-prob
, HOWEVER, for other models this could potentially significantly impact performance.To Reproduce
One can see this by setting
hybridize=False
in theTrainer
and then printingx
withprint(x.asnumpy())
in thelog_prob
function of the corresponding probability function.I tested this, and the error occurs at least the Student-t, Gamma and Beta Distributions.
First creating a dataset without any values even close to
0
.then we try to train our
DeepAREstimator
:Now we modify the
log_prob()
function to print the matrices:Error Message
Environment