PYFTS / pyFTS

An open source library for Fuzzy Time Series in Python
http://pyfts.github.io/pyFTS/
GNU General Public License v3.0
259 stars 53 forks source link

Question on forecasting method #6

Closed jermaine1ronquillo closed 5 years ago

jermaine1ronquillo commented 5 years ago

I'm new to fuzzy logic and I'd like know why the predict method requires the test set of the data? Another question is does the method predict only one-step ahead?

Regards

petroniocandido commented 5 years ago

It is not exactly the "test" data, but the necessary lags used to forecast. On a first order model (as pyFTS.models.chen.ConventionalFTS) all you need to forecast t+1 is the last lag, i. e., a list with the last value of the time series. If you are using a high order model (pyFTS.models.hofts.HighOrderFTS) you will need more past lags.

Not all methods are projected to work with multiple steps ahead forecasting, but on the FTS.predict method there is a parameter "steps" where you can indicate the forecasting horizon you need. In this case the method will feedback its outputs for each step ahead.

Don't hesitate to contact us if you have more questions!

Best regards.

jermaine1ronquillo commented 5 years ago

Thank you for the response, although I'm still a bit confused. I will try pyFTS to forecast water quality parameters, our objectives are to forecast at least 3-day ahead of time and able to predict if the parameters will be above a certain threshold. Definitely I'll ask more questions regarding the use of pyFTS. Again thank you.

petroniocandido commented 5 years ago

Hi!

Check this out: https://colab.research.google.com/drive/1yeaYrgasByD12JI-nEIlE_buQft3YR3I

The minimal input length for the predict method is the order of the model! To forecast multiple steps ahead you just need to use the parameter steps_ahead, indicating how many steps to forecast.

About the water quality time series, is it seasonal? Mono or Multivariate? How looks its ACF?

jermaine1ronquillo commented 5 years ago

Thank you for this, based on my initial exploration on the data the time series show small seasonality (using statsmodels) Below are the graphs image image

petroniocandido commented 5 years ago

It is a public dataset?

jermaine1ronquillo commented 5 years ago

Unfortunately it is not, this was recorded from our treatment facility. Are you interested with the data? Please provide me with your email address and maybe I might get an approval from my superiors.

Regards

jermaine1ronquillo commented 5 years ago

Thank you for the colab link, I used it as reference to model my data (one step forecasting). The result is interesting, see the graph below image

However, I tried to modify the one extreme event to 10 just to check how to model predicts and this is the result image

and I tried another again modifying the data and below is the result image

My question is does the model really perform this good or I'm doing something wrong? How does the model predict my data almost exactly (again I'm new to this)? Thanks in advance!

petroniocandido commented 5 years ago

Can you share your code for verification? I'm working on a pyFTS tutorial for solar forecasting and the results are very good (texts in portuguese), around 5% of error (MAPE): https://colab.research.google.com/drive/1xfonrM853rtWTsVet7oJsFO-OoHWRgk6

The quality of a FTS model depends on several factors: a) Data quality (in general FTS models are very sensitive to outliers); b) Method (different methods for different demands); c) Transformations (pre and post processing operations) d) Partitioning (few partitions will underfit the model, to much partitions will overfit the model); e) Order (the minimal number of lags used by the model); f) Lags indexes (which past lags produce better generalizations); g) alpha_cut (the minimal membership grade considered on fuzzyfication step, it helps to reduce overfit by cutting useless rules)

The default values of the FTS methods generally fit well on data. But depending on you application domain it is necessary to perform a parameter fine tuning . This hyperparameter optimization can performed using a genetic algorithm (I like to use the DEAP library for evolutive optimization: https://github.com/DEAP/deap) or specific hyperparameter optimization library such as hyperopt (https://github.com/hyperopt/hyperopt).

I hope I have helped, but still in contact for any questions!

jermaine1ronquillo commented 5 years ago

Thank for your time and effort! Here is my code, basically I copied your example.

Regards

train=data_mod['2012':'2016']['Turbidity'].values test=data_mod['2017']['Turbidity'].values from pyFTS.partitioners import Grid fig, ax = plt.subplots(nrows=1, ncols=1,figsize=[25,5]) fs = Grid.GridPartitioner(data=train, npart=35) fs.plot(ax) from pyFTS.models import hofts model1 = hofts.HighOrderFTS(order=1, partitioner=fs) model1.fit(train) print(model1) fig, ax = plt.subplots(nrows=1, ncols=1,figsize=[15,5]) forecasts = model1.predict(test) ax.plot(test[80:100], label='test') ax.plot(forecasts[80:100], label='forecast') ax.legend() test_mod=data_mod['2017']['Turbidity'].copy() test_mod['2017-04-04']=10 fig, ax = plt.subplots(nrows=1, ncols=1,figsize=[15,5]) forecasts = model1.predict(test_mod) ax.plot(test_mod.values[80:100], label='test') ax.plot(forecasts[80:100], label='forecast') ax.legend() test_mod['2017-04-05']=10 fig, ax = plt.subplots(nrows=1, ncols=1,figsize=[15,5]) forecasts = model1.predict(test_mod) ax.plot(test_mod.values[80:100], label='test') ax.plot(forecasts[80:100], label='forecast') ax.legend()

petroniocandido commented 5 years ago

Looks fine to me!

Try higher order models to improve the accuracy.

Best regards!

ramdhan1989 commented 3 years ago

Can you share your code for verification? I'm working on a pyFTS tutorial for solar forecasting and the results are very good (texts in portuguese), around 5% of error (MAPE): https://colab.research.google.com/drive/1xfonrM853rtWTsVet7oJsFO-OoHWRgk6

The quality of a FTS model depends on several factors: a) Data quality (in general FTS models are very sensitive to outliers); b) Method (different methods for different demands); c) Transformations (pre and post processing operations) d) Partitioning (few partitions will underfit the model, to much partitions will overfit the model); e) Order (the minimal number of lags used by the model); f) Lags indexes (which past lags produce better generalizations); g) alpha_cut (the minimal membership grade considered on fuzzyfication step, it helps to reduce overfit by cutting useless rules)

The default values of the FTS methods generally fit well on data. But depending on you application domain it is necessary to perform a parameter fine tuning . This hyperparameter optimization can performed using a genetic algorithm (I like to use the DEAP library for evolutive optimization: https://github.com/DEAP/deap) or specific hyperparameter optimization library such as hyperopt (https://github.com/hyperopt/hyperopt).

I hope I have helped, but still in contact for any questions!

did you have example code to use hyperparam technique for pyFTS package ?

petroniocandido commented 3 years ago

Hi @ramdhan1989 !

Please check the issue https://github.com/PYFTS/pyFTS/issues/30