How to handle high peaks in GluonTS simple Feed Forward or DeepAR

adarwade1 commented 4 years ago

Currently I am using GluonTS simple feed forward and DeepAR for forecasting problem.

GluonTS SFF work better and consistent than DeepAR.

We need your advise how to handle high peaks in the Simple Feed Forward. Just to give you idea, peaks vs normal values having difference between 50,000-80,000.

The normal values (mean) is in the range of 100-1000 while peaks are in the range of 30,000-80,000

Note: Facebook prophet having changepoint detection hyperparameter. Is there any such hyperparameter available as part of GluonTS Simple Feed Forward or DeepAR

kaijennissen commented 4 years ago

@adarwade1 What exactly do you mean by peaks? It sound like you are talking about outliers, i.e. a few observations which are far away from the usual level. In this case thera are two options:

Handle outliers by using the DeepAR model with a robust distribution (StudentT - which is the default).
Remove outliers during preprocessing.

adarwade1 commented 4 years ago

Please refer below details. -Objective : Predict high peaks Currently we are working for one of the postal services customer. There are multiple post offices and further each post offices having multiple sorting processes to sort different kind of letters.

In case of one category of letters, the volume will be in chunk on one of the day (third or fourth week of each month). Here requirement is instead of ignoring such peaks (chunk size ) , model should forecast the same high peaks.

We used "week of month" as one of the feature however that also not helping to predict values closer to peak.

Can you please advise on how we can able to predict high peaks correctly using GluonTS

adarwade1 commented 4 years ago

Please let me know if you need any further details. GluonTS Simple Feed Forward model is able to forecast other values with better accuracy however there is big difference in actual vs forecasted values in case of peaks.

kaijennissen commented 4 years ago

@adarwade1 My first guess is, that you need additional features. From your description I suggest features like weekdays, public holidays and other features which are informative about the peaks.

lostella commented 4 years ago

@adarwade1 is DeepAREstimator able to understand the presence of peaks in any way (even if not to a great accuracy)? Currently the SimpleFeedForwardEstimator does not use any date-related feature, so if those are in any way informative of peaks, maybe adding them would be beneficial as @kaijennissen suggests

adarwade1 commented 4 years ago

We used most of date features to handle seasionality and trends.

time_feat: -day of week -day of month -week of month -week of year -season_number

feat_dynamic_real: -is weekend -is holidays -is non production days.

feat_static_cat:

List of time series id

Here main challenge is peaks range of value is too high as compare to mean.

GluonTS simple feed forward able to process 1240 time series at time. The algorithm is consistent over different data patterns including latest data.

adarwade1 commented 4 years ago

The output is used for resources planning. The day on which more volume requires more resources. That's the reason we need to forecast peaks with better accuracy.

kaijennissen commented 4 years ago

@adarwade1 The open question is whether it is possible to forecast the peaks (follow some learnable pattern) or not (peaks are random). As @lostella pointed out, it would be helpful to know if the DeepAREstimator is able to detect the peaks and struggles just with the magnitude of the peaks or if peaks are ignored. Could you share a plot of one example?

adarwade1 commented 4 years ago

Sure. Let me train DeepAR and share you summary

adarwade1 commented 4 years ago

Hi All,

Apology for delay in response as busy with some deliverable. Please refer below sample data

5 Year data trends

-High peaks IMG_20201030_192236 -SFF IMG_20201030_192247

SFF actual vs forecast IMG_20201030_192256

DeepAR actual vs forecast results

IMG_20201030_195105

adarwade1 commented 4 years ago

The last one is the DeepAR results. Magnitude of peaks are not handle by both DeepAR and SFF.

@kaijennissen /@lostella , Can you will be able to join MS Teams meeting if comfortable based on your availability.

kaijennissen commented 4 years ago

@adarwade1 Plots look like I expected. DeepAR/SFF do pick up the weekly seasonality but there is currently no feature which explains the peaks. I think your problem is to correctly predict the timing of these peaks. Maybe the methods described in this paper and the corresponding code are helpful.

adarwade1 commented 4 years ago

Many Thanks for guidance. We will check on and update you on the same.

adarwade1 commented 4 years ago

We used below DeepAR hyperparameters. The NegativeBinomialOutput distribution throws nan errors in the first or second epoch. The data doesn't contains nan values, also tried reducing learning rate however not help with NBO distribution.

We are also using latest mxnet version as suggested in articles.

It work fine for default distribution

Can you please provide your suggestions on suitable hyperparameters. As per review, DeepAR is working fine for most of scenarios with best accuracy. We feel that their might be some mistake from our side.

adarwade1 commented 4 years ago

@adarwade1 Plots look like I expected. DeepAR/SFF do pick up the weekly seasonality but there is currently no feature which explains the peaks. I think your problem is to correctly predict the timing of these peaks. Maybe the methods described in this paper and the corresponding code are helpful.

Ans: These high magnitude peaks present in the 3/4 week of months. So we introduce month of week feature. SFF shown some improvements in the accuracy with inclusion of this feature.

adarwade1 commented 4 years ago

@adarwade1 is DeepAREstimator able to understand the presence of peaks in any way (even if not to a great accuracy)? Currently the SimpleFeedForwardEstimator does not use any date-related feature, so if those are in any way informative of peaks, maybe adding them would be beneficial as @kaijennissen suggests

Can you please share any documents/links which brief on recommendations on DeepAR hyperparameters/ best guidelines if handy.

We gone thru so many articles however not able to locate any detail doc.

lostella commented 4 years ago

I see that some time series in your plots exhibit this spiky behaviour, while others do not. @adarwade1 one thing to try is then to train an estimator only on data that has spikes, and verify that the model can learn that behaviour: that should be the case, if the spikes follow some predictable pattern.

adarwade1 commented 4 years ago

@lostella /@kaijennissen ,

We train/validates model with same data and seed values however everytime we observed different accuracy.

We used below seed values mentioned in the documents. mx.random.seed(0) np.random.seed(0)

Can you please advice on the same.

Thanks and Regards, Abhijit

I observed similar Github issue Not able to achieve reproducible training and resultd#1040

I am using CPU with One Azure Machine learning compute instance (single vm with good configuration)

lostella commented 4 years ago

@adarwade1 in #1040 I posted a snippet with which I'm getting consistently the same exact results every time, if I'm not wrong. Could you try that?

Also, to avoid confusion, feel free to open a separate issue about this, given that it is unrelated to the original post here.

adarwade1 commented 4 years ago

@lostella, Handling of high peaks , please let us know your availability so that we can connect.

Thanks and Regards, Abhijit

adarwade1 commented 3 years ago

@lostella, We tried our best however DeepAR is not able to handle high peaks...Can you please suggest any hyperparameters that can help

kaijennissen commented 3 years ago

@adarwade1 Have you considered the case that this is not a problem of the DeepAR Model but of your specific problem?

awslabs / gluonts

How to handle high peaks in GluonTS simple Feed Forward or DeepAR #1112