Unsatisfactory forecast results

lesmesrafa commented 2 years ago

Hi, I was testing this tool to forecast 10 time series, however, in none of the predictions I did I can't consider that the tool returns satisfactory results (Considering that in the examples (notebook_examples/tspDB Example-Multiple Time Series (Real-world Data).ipynb] almost all time series achieve good results). Below you can see some screenshots of the results I have obtained for each time series. The screenshots have been obtained from the following datasets:

The following images have been obtained from tmpseries.csv and all of them have been obtained predicting between 8 and 14 days in the future:

Time Series A:
Time Series B:
Time Series C:
Time Series D:
Time Series E:
Time Series F:
Time Series G:
Time Series H:
Time Series I:

These results were obtained as shown in the following .ipynb: codemultipleTS.zip

The following images have been obtained from tmpseries2.csv. The first image corresponds to a one-month forecast and the second image to a 10-day forecast.

Time Series one month ahead:
Time Series 10 days ahead:

These results have been obtained using the same code as the previous predictions (evidently changing the dataset path, the prediction dates and the distribution of the data in the training data (80% of the dataset) and test data (20% of the dataset).

Does anyone else get the same results/or have the same problem with other time series? Or am I using this tool wrong?

I obtained these results with the following software versions:

Ubuntu 20.04.1
PostgreSQL 12.10
Python 3.9

AbdullahO commented 2 years ago

Hey @lesmesrafa,

Thanks for the feedback. It is possible that the tool may not provide satisfactory results for some time series. Though I think one thing that would help a lot is playing around with the hyperparameter k (see API reference here: https://tspdb.mit.edu/API/ )

This hyperparameter control the complexity of your model; the higher it is the higher the complexity. By default, it is selected in a data-driven way that often works well, but for some time series it is important to tune it. For example, you can set k to, e.g., 3 using the query (an adjusted form of the query from your notebook)

select create_pindex('ts_pred', 'Date','"""+columns+"""','pindex1', k =>3);

Another potential problem, which I think is probable from looking at the second dataset, is that your time series has a change-point. i.e., its underlying process changes and hence you are fitting data from different "regimes". See our work here and the associated paper here.

suhanovs commented 2 years ago

@lesmesrafa I was getting similar results to your time series one month ahead chart, until I realized that the algorithm is to add newly observed data points to the dataset you use to make predictions. So the algorithm is as follows:

Build an index on dataset up to time T. Use the setting to prevent automatic pindex update.
Predict T+1 and decide if you want to act on it
When T+1 is observed, write it to the dataset
Predict T+2 (with T+1 in the dataset) and decide if you want to act on T+2 prediction
When T+2 is observed, write it... and so on

This is apparent if you look at this https://colab.research.google.com/drive/1yA3gMVB3XxKYgnSKx0O5dTWMElfX8J2S?usp=sharing#scrollTo=X4n3Eho-zStu, scenario III.

Then you get something more along the lines of this:

But as I start to review my test results, I am also starting to wonder how to apply this, if it could work better on some other time periods or a different instrument, etc. I am finding predictions to be lagging similar to (pick your favourite lagging indicator).

AbdullahO commented 2 years ago

The tool will give you both 1-step ahead and multistep ahead forecast, but as one would expect, the multistep ahead forecast will be less accurate.

The tool will automatically incorporate new data points to the model as you insert them to the DB. So the way forecasts carried out above is for evaluating 1-step ahead forecasts.

suhanovs commented 2 years ago

@AbdullahO Thank you for confirming this.

arpieb commented 2 years ago

@AbdullahO confirming I'm reading the above correctly - if I say predict multistep ahead for 10 days, day t+10 doesn't incorporate t+1..t+9 forecasts in its predictions unless they are written to the database t+1 at a time?

AbdullahO commented 2 years ago

It absolutely does. What @suhanovs is referring to how you can do 1-step ahead forecasts over a long period; which basically goes as he described.

lesmesrafa commented 2 years ago

@AbdullahO @suhanovs in the algorithm you say (https://colab.research.google.com/drive/1yA3gMVB3XxKYgnSKx0O5dTWMElfX8J2S?usp=sharing#scrollTo=X4n3Eho-zStu scenario III), what do you insert in the dataset: the test samples (i in the loop of the code you mention) or the predictions made with predict query?

I say this because, if I insert in the dataset the samples of the test dataset (I), I get results similar to yours, however, in real life, you don't have the test dataset (for example if I want to predict time series A for May 4, 2022, I couldn't because I don't have data up to that date). On the other hand, if I insert the predictions made (something that would make sense in a real forecasting application) I don't obtain good results.

AbdullahO commented 2 years ago

If you need to forecast one month in advance, then what you did in the first post is the right thing to do; the forecast performance may be enhanced by changing the parameter k
if you only want to do some k-steps forecast for some k, and you want to evaluate the tool. Then you should follow scenario 3 in the colab. Assume you want to forecast the period T+1 to t+Nk for some integer N . Then you do:
1. forecast the time series from T+1 to T+k
2. insert the actual test samples at time T+1 to T+k to the DB
3. forecast the time series from T+k+1 to T+2k
4. insert the actual test samples at time T+k+1 to T+2k to the DB .. and so on and so forth

suhanovs commented 2 years ago

@lesmesrafa I don't know if I agree that in real life you don't have the dataset that we refer to as "test dataset" here. To use your example of May 4, you can make a prediction for May 4 using data that you have today on April 30. When you get May 1 data, you add it to your dataset and revise prediction you made for May 4. The closer you get to May 4, the more data you will have on hand, and the more accurate your prediction becomes (per @AbdullahO, predictions made on predictions are less accurate).

@AbdullahO Thanks for the 'k' hint.

AbdullahO / tspdb

Unsatisfactory forecast results #17