PYFTS / notebooks

Code examples for pyFTS
46 stars 30 forks source link

Forecasting real future #1

Open georgevarelas opened 5 years ago

georgevarelas commented 5 years ago

Hi Petronio. Thank you for your quick response to my problem. I'm coming again to ask about predictions in real future. In you predict method you need a dataset. What if I want to forecast "n" days ahead from today where there is no data. What dataset will I use in predict method when I use all data (until today) for training (it does not matter any test set). Let's assume we have the model below:

fs = Grid.GridPartitioner(data=all_data, npart=190) model = hofts.HighOrderFTS(partitioner=fs, order = 4, alpha_cut=.4) model.fit(all_data.values.flatten())

Let's say I keep the last 4 points from "all_data" to accomodate the order of the model.How can I call model.predict method to forecast future days (until end of August)?

I forgot to write that when I use steps_ahead = n I get n times the same number. If you ask about my dataset, it is public and it is the close price of Bitcoin

Best regards George

petroniocandido commented 5 years ago

Hi George,

I think you are talking about many steps ahead forecast. For this kind of forecasts just inform the parameter steps_ahead=k (where k is the forecasting horizon) in the predict function. You have also to inform a sample with at least the number of instances equal to the order of the model. If you have a larger sample and want to start in a specific instance inside the sample, you can also inform the parameter start_at. Both parameters are explained here: https://pyfts.github.io/pyFTS/build/html/pyFTS.common.html#pyFTS.common.fts.FTS.predict

You can also see an explanation on the section "Expanding the forecasting horizon" in this post: https://towardsdatascience.com/a-short-tutorial-on-fuzzy-time-series-part-ii-with-an-case-study-on-solar-energy-bda362ecca6d

Well, I believe this is what you want to. But if I did not understand correctly your question, please let me know. For more details check this

georgevarelas commented 5 years ago

Hi Petronio. I looked very carefully the links you gave me.

First I want to make things a little more clear although I think you got it well. Let's assume that my data stops at 13/7/2019 (d/m/y format). I want to forecast prices for the interval 20/7/2019 - 30/7/2019 where there is no data. I tried the following command set (assuming that the 'Bitcoin.txt' contains the closing prices of Bitcoin)

initial_data = pd.read_csv('Bitcoin.txt',header=0,sep=",") nitial_data.drop(initial_data.columns.values[0],axis=1,inplace=True) initial_data['vari'] = pd.to_datetime(initial_data["Date"]) initial_data.set_index('Date' ,inplace=True) initial_data.columns.values[0]='Price' in_dt=np.log(initial_data).copy() data=in_dt.diff().copy() data.dropna(inplace=True) . . . mp_data=data.copy()

out=60 out_data=mp_data[-out:].values.flatten() in_data=mp_data[:-out].copy() order=4

fs = Grid.GridPartitioner(data=in_data, npart=190)

model = hofts.HighOrderFTS(partitioner=fs, order = order, alpha_cut=.4) model.fit(in_data.values.flatten())

out_data=mp_data[-out:].values.flatten() in_data=mp_data[:-out].copy()

forecasts_out=model.predict(out_data, start_at=len(out_data)-4, type='point', steps_ahead=8)

The result is always the same number

Index Type Size Value 0 float64 1 -0.01480162960180964 1 float64 1 -0.01480162960180964 2 float64 1 -0.01480162960180964 3 float64 1 -0.01480162960180964 4 float64 1 -0.01480162960180964 5 float64 1 -0.01480162960180964 6 float64 1 -0.01480162960180964 7 float64 1 -0.01480162960180964

It is like repeating the first prediction. If I use "forecasts_out=model.predict(out_data)" the model makes predictions for the this dataset and it behaves well. But in order to be sure that the prediction are correct I need to implement the method in dates interval where there is no data starting of course with limited data (the last 3 points in my case to accommodate the order of the model). I am attaching the data in case you need to test something.

Bitcoin - Copy.txt

Best regards George

georgevarelas commented 5 years ago

Even if I use the command "forecasts_out=model.predict(out_data, start_at=59)" it returns 56 forecasted points instead of 1. And this happens regardless of the number I choose as starting point. Do I do something wrong here? Or maybe I have understood something wrong about the package functionality.

petroniocandido commented 5 years ago

George,

How did you analyze this time series? Did you check the ACF and the PACF? How long is its memory? Why did you decide to use 190 partitions? Did you perform an hyper-parameter optimization? Why are you using the HOFTS method and not the WHOFTS or PWFTS, etc?

What do you think about create a Google Colab (or similiar) notebook with your codes and share it with me in order to allow me check these and other questions?

Best regards!

georgevarelas commented 5 years ago

Petronio, why are you asking me analysis questions when I am asking absolutely technical question? Anyway, you can consider the first two questions as completed. But has ACF and PACF something to do with the fact that predict returns the same number?

As for the rest of the questions, I can say that I hardly managed to find some information on HOFTS. It is difficult to draw info from examples only. The package reference you have on MINDS does not explain the function arguments or what the function is about. So after a lot of research in the internet I found enough information that helped me to implement HOFTS (which is a model that serves my goal) with my data. If you follow Box-Jenkings methodology in order to find AR amd MA and the order of the model in general with Bitcoin, you will not get reliable information but this is up to you. I will not go into much details on this.

Having said that could you please advice me how to use optimization with pyFTS? And the most critical. What should I do in order to get values when I use steps_ahead with predict? How to ue start_at and get predictions only after that (as I wrote above when I used it, it returned a full data set predictions) ?

I am attachnig 2 files that might help you check everything. Because github supports only txt for our case I have included the code in txt file. The only thing you need to do is to coly the code in your python IDE or rename my file into .py and run it while having the data file in the same directory.

Bitcoin.txt code.txt

My regards George

petroniocandido commented 5 years ago

George,

No, this is not a pure technic issue. It is also a theoretical issue.

My point is: the behavior of the time series may lead to early convergence to local mean. It is known in the Time Series Analysis field that the predictability of the time series depends on their type (stationarity, trend, seasonality, etc), and ACF and PACF are good indicators of time series type. Time series like Bitcoin, as the majority of the financial time series, IN THEORY, are martingales. Given a forecasting horizon H (the number of steps ahead you want to forecast), as H tend to infinity the predicted value tends to the time series mean value. The velocity that H converges to the mean depends on the predictability of the data.

Then, as you forecasting horizon H increases, also increases your error and uncertainty. I strongly advise you to use interval or probabilistic forecasting together with the point forecasting. For this purpose, the PWFTS is the state of art method (https://doi.org/10.1109/TFUZZ.2019.2922152)

The technical concerns are related with pyFTS implementations and documentation. I agree with you about the HOFTS information, we are working to fix these documentation issues. For a while, you will need to resort to my Medium posts about FTS to get a more comprehensive overview of these methods ( for this specific problem I suggest you read the parts II and III .

I create a notebook to explore the best hyperparameters for the Bitcoin data: https://colab.research.google.com/drive/1FIKRzmEmSVyuN3DvplyCWtSqFz5rVCv8 . There I'll be researching the best model for many steps ahead forecasting this time series.

Sincerely yours Petrônio C. L. Silva

georgevarelas commented 5 years ago

Thank you again for the prompt answer Petronio.

Give me some time to examine your code and I will come back. We maybe have differences on the usefulness of ACF and PACF on Bitcoin (in which the fluctuations are not like an ordinary stock time series) but this is not the issue here. I agree with you that the interval before 2017 is useless to me since I need to move forward and I will keep only the last 810 days. My goal is to create a forecast set, out of sample (after the last point of "out_sample" dataset in my code). If I need to use recurrency I will do it. But if steps_ahead works well, it is better.

In the midtime, if there is any function that deals with classification problem drop me a line to get information.

Thank you in advance George Varelas

ramdhan1989 commented 3 years ago

Hi everyone, @georgevarelas did you solve this problem ? I got same confusion with you. @petroniocandido , I would like to clarify the predict method. simply said the data is below image

how to get forecast for december 2016 ? in the other word, get one step ahead prediction from nov 2016.

thank you

regards

petroniocandido commented 3 years ago

Hi @ramdhan1989 !!

Give me a little bit more information about your model. It is an univariate or multivariate model? I am asking this because the predict method works distinctly in each case.

Usually the predict method took the below parameters:

In this example, and I am assuming that you trained a univariate method, you will have to inform steps_ahead=1, since the time series is monthly sampled and your last instance is from November.

Best regards.

ramdhan1989 commented 3 years ago

Ok well noted, thanks. I start to understand how does the predict function work after doing some experiments based on your advice. Basically, I have 100 series and need to forecast one step ahead. there are additional series that can be used as external information for each series in those 100 series. for now, I start with simple approach by considering the problem as univariate without any external information used. Now, I have a bunch of question regarding my problems. I will ask it in the new issue.

thank you regards