Closed kaionwong closed 1 year ago
Hello @kaionwong,
Without seeing the code it is a bit complicated to imagine the problem and answer the questions. The spikes can be caused by seasonality factors in the past behavior, but this is not always the reason.
if you provide us with more information we may be able to help you.
Javier
I am also interested in the answer to this question. I have sometimes similiar issues.
@JavierEscobarOrtiz I have updated in my OP. Thanks.
It has been 60 days since the last activity on this GitHub issue. Since we have not received any updates or progress reports, we will be closing this issue.
If this issue is still relevant and requires attention, please feel free to reopen it and provide an update. We appreciate your contributions and would love to see this issue resolved if it is still relevant.
Thank you for your participation and cooperation!
Can someone help explain or point me to how to better interpret these graphs?
I am using
XGBRegressor
classifier to make prediction in the test period, with each time unit be making a 4-week prediction per step. What can explain the large spikes between Aug and Nov?Similarly with the prediction interval, what accounts for these large spikes (I set it as
interval=[5, 95],
), and why the interval only extends above the predicted values?Edits
The problem space is to predict the number of case count (number of occurrences of an event) in a particular location during the following period. The case count time series (the time unit is weekly) is split into training, validation, and test sets.
Sample data is as follows:
The test period is between 2020-07-01 to 2021-08-21. If I make a 4-week rolling prediction (
future_prediction_n_step = 4
), it looks like thisIf I make a 8-week rolling prediction (
future_prediction_n_step = 8
), it looks like thisMy questions are: 1) Why are there spikes (huge ups and huge downs) that should not be explained by seasonality as the seasonality is clearly annual and not within a year? 2) When the trained algo is producing the 4-week forecast, will it not "see" the test data at all? Or will the algo be exposed to the actual test data in a rolling fashion? In other words, will the algo be using actual data in the test data set (2020-07-01 to 2021-08-21) during its prediction in the test set, or will it only be based on actual data prior to 2020-07-01 and the predicted values during the test period?
My code