PYFTS / pyFTS

An open source library for Fuzzy Time Series in Python
http://pyfts.github.io/pyFTS/
GNU General Public License v3.0
259 stars 53 forks source link

One Step Forecasting PWFTS #16

Closed ghost closed 5 years ago

ghost commented 5 years ago

Hey Petronio when i try to do step by step forecasting and re-fitting the model i am getting much different results then when i simply do model.predict(all_data) the step predictor is much more smooth (almost a linear slope) and the simple model.predict() method works quite well but i am afraid there might be some data leak as when i convert the regression results to binary classes like this yhat = [0 if preds[n]>preds[n+1] else 1 for n in range(len(preds)-1)] the accuracy score is 1.0 on test and train set, would you have any sanity checklist when dealing with FTS,

The model i am using is pwfts with the standard TAIEX data is the model supposed to be used this way this is my first encounter with FTS outside of a academic book

def step_predictor(train_data,test_data): history = [n for n in train_data] history = np.array(history) preds = [] preds.append(model.forecast_ahead(history,steps=1)) for i in tqdm(range(len(test_data))): history = np.append(history,test_data[i]) model.fit(history) preds.append(model.forecast_ahead(history,steps=1)) return preds

*aditional info

So as i was playing around with it i can see that the predictions also depend on the size of the array that it's predicting why could that be ?

after 20 min :) It might be due to the way the smaller datasets get partitioned

petroniocandido commented 5 years ago

I am analyzing your code yet but, a priori, you make some conceptual mistakes.

You can't use the "forecast_ahead" function to perform one step ahead forecasting, the correct way to do this is using the "forecast" function. The "forecast_ahead" function has some assumptions that "forecast" function does not have (for example, it will consider just the last data lags as input and ignore the first ones) and these specificities are treated on "predict" function.

In general the functions "forecast" and "forecast_ahead" are made just for internal method use, and the end user must always call the "predict" function and let it chose the best internal function to call given the informed parameters. The same idea can be applied to the "train" (just for internal use) and "fit" (interface for end users) functions.

I think now that this information not well, neither explicitly, explained on documentation and tutorials and I must fix this to avoid this kind of mistake.

This information helped you to understand better the use of pyFTS models?

ghost commented 5 years ago

Yes Thank you if you need any help writing documentation i would be glad to help :)

P.S model.predict(data,steps_ahead=1) returns all predictions, while model.predict(data,steps_ahead=2) returns 2 predictions is this on purpose?

petroniocandido commented 5 years ago

Yes, it is! You don't need to specify steps_ahead=1, this is the default behavior. In this case, for each data point in the input list (or array) will be generated a one step ahead forecast (using the number of past lags specified on model's order). A input list will generate a output list (one data point for each group of lags). But for many steps ahead it will always take the last n lags (order=n) and forecast the next m data points (steps_ahead=m).

Suppose a model with order=1, a input sample with 10 data points and you want to forecast two steps ahead (steps_ahead=2). It will be tricky if for each data point the model generate two data points, so it will have a list of lists and this may be a little messy and also a little be unusual. So the default behavior of pyFTS is to take the last lags and forecast ahead.

ghost commented 5 years ago

Awesome :) thanks again