Closed pjebs closed 3 years ago
If I have sales data for 2 different stores:
Store 1
date | sales | Public Holiday |
---|---|---|
2016-01-01 | 1200 | N |
2016-01-02 | 1300 | Y |
2016-01-03 | 1400 | N |
Store 2
date | sales | Public Holiday |
---|---|---|
2016-01-02 | 1300 | Y |
2016-01-03 | 1400 | Y |
2016-01-04 | 1200 | N |
The documentation doesn't put into simple language how I am meant to apply it:
{"start": "2016-01-01", "cat": [1], "target": [1200, 1300, 1400], "dynamic_feat": [[1.1, 1.2, 0.5, ..]]}
{"start": "2016-01-02", "cat": [2], "target": [1300, 1400, 1200], "dynamic_feat": [[1.1, 1.2, 0.5, ..]]}
Should it be?:
... "dynamic_feat": [[0, 1, 0]] } // A 1 representing a public holiday
... "dynamic_feat": [[1, 1, 0]] }
The implication being if a location has no public holidays, then I need to provide an array of all 0 with the same number of entries as the number of datapoints.
If that is the case, for the missing value scenario where some datapoints are "NaN", then I assume I can pick a 0 or 1 and it will just act as a placeholder and have no effect?
Thank you in advance @djarpin
@pjebs - Yes, your coding of the Public holiday dynamic_feature is correct above and yes, use an array of all zeros if a location doesn't have any public holidays. Let me get back to you on the "NaN" issue. The model is used to interpolate NaNs, so there may be benefit in accurately codifying dynamic_feat even if you have a NaN target value for that data point.
I can see in my value I have a boolean scenario for public holidays. But dynamic features allows a real value. What's a scenario where the dynamic features can be something other than a boolean?
@pjebs - One example might be if you were forecasting sales for a product you could include the price of that product. Then if you had a planned promotion coming up, your forecast could include that effect.
Alternatively, if you were forecasting electricity consumption of an individual home, knowing the temperature may be beneficial. As temperature rises, people are more likely to use air conditioning so we might expect electricity consumption to go up as well.
Here's where it gets a bit tricky though with forecasting. Presumably, we'd train the model on actual temperatures and then include forecasted temperature (from some external source) to get future electricity consumption predictions. But if there's as much noise in the forecast for temperature as the noise we'd get in our typical forecast for electricity consumption without including temperature as a feature, then we don't really need to include it.
@djarpin - thank you for the clarification.
Follow up question. Let's say, I need include price of the product as a dynamic feature. Is this the correct approach to provide new features? Also, why the name "dynamic features" instead of simply calling it as "features" (in DeepAR)?
... "dynamic_feat": [[0, 1, 0], [10.25,11.50,8.95]] } // public holiday and price features ... "dynamic_feat": [[1, 1, 0], [2000,2000,1750]] } // public holiday and price features
@pjebs in the case of NaN in the target, you should still provide the correct corresponding dynamic_feat value.
@ChandraLingam your example is correct. The name "dynamic_feat" comes from the fact that they are time-dependent, and to avoid confusion with possibly static real features (i.e. non-categorical features associated with the whole time series, and not with each time step).
@lostella - I ran into unexpected issues when using dynamic features. some future time steps did not have dynamic features and I don't have a reasonable way to impute missing values. DeepAR expects finite values in dynamic features so I cannot pass NaN for missing features.
Prediction length = 200 Missing dynamic features = around 10 in 200 (randomly distributed)
What is a reasonable way to handle this situation?
You can pass nan.
@ChandraLingam One way that I can think of to cope with that situation is: fill-in missing values of feature A with some finite value (via imputation if you can, or otherwise using some other placeholder), and then add an additional indicator feature B that indicates which values of A are genuine and which are imputed/placeholders.
However: My guess is that, in order for this to work, you also have "missing" values for feature A in the training data (otherwise the model won't be able to learn how to handle that case). If that is not the case, you can still artificially hide some values of A chosen at random from the training data (5% of them, given what you said) to recreate in the same condition at training time as at inference time.
@lostella Thank you so much for your suggestions. Yes, training data also had missing steps and dynamic features and targets were missing. I was able to fill in the missing values for features (and left target as missing), train deepar model and run predictions on it.
Is there any way to provide custom holidays or special days to DeepAR model ?
@engineeryashsaxena you can provide additional time-dependent features in your data using the dynamic_feat
field: you can use this as a binary indicator of holidays or special events of any sort. As explained in the documentation (see the "Input/Output Interface for the DeepAR Algorithm" section), this array should have the same length as the target
field in the training data. Note however that at inference time the dynamic_feat
should also include the value of the feature in the prediction time range (see here).
So for example, if at inference time you are providing 500 data points in target
and you set prediction_length
to 24 for the model you trained, then each array in dynamic_feat
should contain 524 values.
I hope this helps!
You can pass nan.
Can you pass NaN to dynamic feature?
Can you pass NaN to dynamic feature?
No, you can't
Hello, i studying the deepAR algorithm, i will like understand the output of model. The percentile value are confidence interval? or credibility interval? why the first is a approach frequentist and second a approa ch bayesian. The second question is about if i can get the value of variable that are predicting or just i can see information of the probability distribucion
@nicolasignaciopinocea:
DeepAR isn't a Bayesian approach (there aren't any priors around for example) but it does provide you with uncertainty estimates via prediction intervals (confidence interval have a stricter meaning in statistical terminology which doesn't apply here).
You can get the value of the forecast out if this means a point forecast. For example, if ask for the P50 (=median) of the distribution, you get a point forecast.
What is a good way to handle dataset where dates have multiple different parameters? Say for following example, I build multiple time series based on action but I like to use region (and possibly others) as dynamic-feature to forecast count for each time series.
date action region count
0 2019-05-31 x a 1
1 2019-05-31 x b 28
2 2019-05-31 y c 3
3 2019-05-31 z d 57
4 2019-05-31 z e 1
Dear SageMaker Community, In our attempts to provide this repository with a better level of support going forward, we’re closing issues that were opened prior to the v2 release of the SDK. This is because we believe that over time many of the issues posted were solved with the latest release or other recent changes to the repo. This will help us reallocate resources towards issues that are more likely to still be relevant today. Some of the issues experienced now can be resolved by referencing the v2 guide: https://sagemaker.readthedocs.io/en/stable/v2.html In this guide, you can find simple solutions to common notebook errors, like the renaming of parameters and classes. If you believe your issue is still ongoing and you have updated error messaging or other info, please re-open it, and we investigate the issue. Best Regards, AWS SageMaker Team
I have timeseries which have day dependency i.e {"combination":"item_1","start": "2009-11-01 00:00:00", "target": [4.3, "NaN", 5.1, ........], "dynamic_feat": [[1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0.....]],"label":[2.3]} {"combination":"item_2","start": "2009-11-01 00:00:00", "target": [1.0, -5.0, ......................], "dynamic_feat": [[1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0.....]],"label":[-2.9]} {"combination":"item_3","start": "2009-11-01 00:00:00", "target": [2.0, 1.0...........................], "dynamic_feat": [[1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0.....]],"label":[2.1]} Everytime i have same dynamic features repeating from 1-7 (Monday to Sunday) but for different combinations/items.
In this case i do not get any predictions , Can someone help why this is happening?
@djarpin
I am a bit confused by how to apply the
dynamic_feat
:The documentation has as a example:
It also states: