Closed lesmesrafa closed 2 years ago
Hey @lesmesrafa,
Thanks for the feedback. It is possible that the tool may not provide satisfactory results for some time series. Though I think one thing that would help a lot is playing around with the hyperparameter k (see API reference here: https://tspdb.mit.edu/API/ )
This hyperparameter control the complexity of your model; the higher it is the higher the complexity. By default, it is selected in a data-driven way that often works well, but for some time series it is important to tune it. For example, you can set k to, e.g., 3 using the query (an adjusted form of the query from your notebook)
select create_pindex('ts_pred', 'Date','"""+columns+"""','pindex1', k =>3);
Another potential problem, which I think is probable from looking at the second dataset, is that your time series has a change-point. i.e., its underlying process changes and hence you are fitting data from different "regimes". See our work here and the associated paper here.
@lesmesrafa I was getting similar results to your time series one month ahead chart, until I realized that the algorithm is to add newly observed data points to the dataset you use to make predictions. So the algorithm is as follows:
This is apparent if you look at this https://colab.research.google.com/drive/1yA3gMVB3XxKYgnSKx0O5dTWMElfX8J2S?usp=sharing#scrollTo=X4n3Eho-zStu, scenario III.
Then you get something more along the lines of this:
But as I start to review my test results, I am also starting to wonder how to apply this, if it could work better on some other time periods or a different instrument, etc. I am finding predictions to be lagging similar to (pick your favourite lagging indicator).
The tool will give you both 1-step ahead and multistep ahead forecast, but as one would expect, the multistep ahead forecast will be less accurate.
The tool will automatically incorporate new data points to the model as you insert them to the DB. So the way forecasts carried out above is for evaluating 1-step ahead forecasts.
@AbdullahO Thank you for confirming this.
@AbdullahO confirming I'm reading the above correctly - if I say predict multistep ahead for 10 days, day t+10 doesn't incorporate t+1..t+9 forecasts in its predictions unless they are written to the database t+1 at a time?
It absolutely does. What @suhanovs is referring to how you can do 1-step ahead forecasts over a long period; which basically goes as he described.
@AbdullahO @suhanovs in the algorithm you say (https://colab.research.google.com/drive/1yA3gMVB3XxKYgnSKx0O5dTWMElfX8J2S?usp=sharing#scrollTo=X4n3Eho-zStu scenario III), what do you insert in the dataset: the test samples (i in the loop of the code you mention) or the predictions made with predict query?
I say this because, if I insert in the dataset the samples of the test dataset (I), I get results similar to yours, however, in real life, you don't have the test dataset (for example if I want to predict time series A for May 4, 2022, I couldn't because I don't have data up to that date). On the other hand, if I insert the predictions made (something that would make sense in a real forecasting application) I don't obtain good results.
k
@lesmesrafa I don't know if I agree that in real life you don't have the dataset that we refer to as "test dataset" here. To use your example of May 4, you can make a prediction for May 4 using data that you have today on April 30. When you get May 1 data, you add it to your dataset and revise prediction you made for May 4. The closer you get to May 4, the more data you will have on hand, and the more accurate your prediction becomes (per @AbdullahO, predictions made on predictions are less accurate).
@AbdullahO Thanks for the 'k' hint.
Hi, I was testing this tool to forecast 10 time series, however, in none of the predictions I did I can't consider that the tool returns satisfactory results (Considering that in the examples (notebook_examples/tspDB Example-Multiple Time Series (Real-world Data).ipynb] almost all time series achieve good results). Below you can see some screenshots of the results I have obtained for each time series. The screenshots have been obtained from the following datasets:
The following images have been obtained from tmpseries.csv and all of them have been obtained predicting between 8 and 14 days in the future:
These results were obtained as shown in the following .ipynb: codemultipleTS.zip
The following images have been obtained from tmpseries2.csv. The first image corresponds to a one-month forecast and the second image to a 10-day forecast.
These results have been obtained using the same code as the previous predictions (evidently changing the dataset path, the prediction dates and the distribution of the data in the training data (80% of the dataset) and test data (20% of the dataset).
Does anyone else get the same results/or have the same problem with other time series? Or am I using this tool wrong?
I obtained these results with the following software versions: