PYFTS / pyFTS

An open source library for Fuzzy Time Series in Python
http://pyfts.github.io/pyFTS/
GNU General Public License v3.0
262 stars 54 forks source link

how to use hyperparams #30

Open ramdhan1989 opened 4 years ago

ramdhan1989 commented 4 years ago

I am struggle to find guidance about how to use hyperparam modul such as grid search or evolutionary. anyone can share ?

thank you

petroniocandido commented 4 years ago

Hi @ramdhan1989

Thanks for your interest in our tool, and forgive-me for the long delay.

First of all, before hyperparameter optimization (hereafter called hyperopt), you should perform the time series analysis (such as ACF/PACF plots, tests of stationarity and scedasticity, etc). Hyperopt does not unuseful to know how your time-series data behaves.

The hyperparameter optimization of FTS is described here, and is called DEHO - Distributed Evolutionary Hyperparameter Optimization, but there are other methods then evolutionary in the library. The return of the method will be a dictionary with the best parameters found for forecasting the dataset using the selected FTS method (in the parameter fts_method).

Below a list of the implemented methods:

from pyFTS.hyperparam import GridSearch
from pyFTS.models import hofts
from pyFTS.data import TAIEX

datasetname = 'TAIEX'
dataset = TAIEX.get_data()

#The list of hyperparameters search spaces
hyperparams = {
    'order': [1, 2, 3],
    'partitions': np.arange(10,100,3),
    'partitioner': [1,2],   #GridSearch, EntropySearch, ...
    'mf': [1, 2, 3],    #Triangular, Trapezoidal and Gaussian
    'lags': np.arange(2, 7, 1),  # The lag indexes
    'alpha': np.arange(.0, .5, .05)  #Alpha Cut
}

GridSearch.execute(
        hyperparams,              #A dictionary containing the search spaces for each hyperparameter
        datsetname,                 #Just the name of your dataset 
        dataset,                        #Your time series data (list or np.ndarray 1d)
        fts_method=hofts.WeightedHighOrderFTS,     # the FTS method you want to optimize [only univariate methods]
        window_size=10000,    #The length of the data window for the Sliding Window Cross Validation method
        train_rate=.9,                #The proportion of the data window that will be used for training, the remaining will be used for test
        increment_rate=.3,       #The sliding increment the Sliding Window Cross Validation method
        database_file='hyperopt.db'   #A sqlite database that will contain the log of the hyperopt process
)

There is no GridSearch implementation yet for multivariate methods.

from pyFTS.hyperparam import mvfts as deho_mv from pyFTS.models.multivariate import mvfts, wmvfts from pyFTS.models.seasonal.common import DateTime from pyFTS.data import Malaysia

dataset = Malaysia.get_dataframe() dataset['time'] = pd.to_datetime(data["time"], format='%m/%d/%y %I:%M %p')

explanatory_variables =[ {'name': 'Temperature', 'data_label': 'temperature', 'type': 'common'}, {'name': 'Daily', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.minute_of_day, 'npart': 24 }, {'name': 'Weekly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_week, 'npart': 7 }, {'name': 'Monthly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_month, 'npart': 4 }, {'name': 'Yearly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_year, 'npart': 12 } ]

target_variable = {'name': 'Load', 'data_label': 'load', 'type': 'common'}

deho_mv.random_search( datsetname, #Just the name of your dataset dataset, #Your time series data (pd.DataFrame) npop=200, #Size of population of the RS mgen=70, #Number of iterations of the RS fts_method=wmvfts.WeightedMVFTS, #The multivariate FTS method to optimize variables=explanatory_variables, #The list of exogenous/explanatory variables target_variable=target_variable, #The endogenous/target variable window_size=10000, #The length of the data window for the Sliding Window Cross Validation method train_rate=.9, #The proportion of the data window that will be used for training, the remaining will be used for test increment_rate=.3, #The sliding increment the Sliding Window Cross Validation method )


- **Genetic Algorithm (GA)** is between GS and RS, both in accuracy and computational cost.

from pyFTS.hyperparam import Evolutionary from pyFTS.models import hofts from pyFTS.data import TAIEX

datasetname = 'TAIEX' dataset = TAIEX.get_data()

ret = Evolutionary.execute( datsetname, #Just the name of your dataset dataset, #Your time series data (list or np.ndarray 1d) fts_method=hofts.WeightedHighOrderFTS, # the FTS method you want to optimize [only univariate methods] ngen=30, #Number of generations, the number of iterations of the GA npop=20, #The size of population of the GA psel=0.6, #Probability of selection of the GA pcross=.5, #Probability of crossover of the GA pmut=.3, #Probability of mutation of the GA window_size=10000, #The length of the data window for the Sliding Window Cross Validation method train_rate=.9, #The proportion of the data window that will be used for training, the remaining will be used for test increment_rate=.3, #The sliding increment the Sliding Window Cross Validation method experiments=1, #Number of hyperopt experiments to perform database_file='hyperopt.db' #A sqlite database that will contain the log of the hyperopt process )



Please, do not hesitate to get in touch if you have any questions.

Best regards   
ramdhan1989 commented 4 years ago

Thanks, all those three method work ! after executing hyperparameter optimization, does the model fitted automatically using the best params ? or we need to take the value from the output dict and fit the model ? Would you mind elaborating more about the dict ? I am confused the values belong to which parameter ? from your code using GA : Experiment 0 Evaluating initial population 1600098526.9596627 GENERATION 0 1600098526.9596627 WITHOUT IMPROVEMENT 1 GENERATION 1 1600098526.9606583 WITHOUT IMPROVEMENT 2 GENERATION 2 1600098526.9626496 WITHOUT IMPROVEMENT 3 GENERATION 3 1600098526.963645 WITHOUT IMPROVEMENT 4 GENERATION 4 1600098526.9656367 WITHOUT IMPROVEMENT 5 GENERATION 5 1600098526.9666321 WITHOUT IMPROVEMENT 6 GENERATION 6 1600098526.9686234 WITHOUT IMPROVEMENT 7 ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'rmse', inf) ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'size', inf) ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'time', 0.010952949523925781)

below is the return dict : {'alpha': 0.5, 'f1': inf, 'f2': inf, 'lags': [2, 6, 7], 'mf': 1, 'npart': 40, 'order': 3, 'partitioner': 2, 'rmse': inf, 'size': inf, 'time': 0.010952949523925781}

petroniocandido commented 4 years ago

Hi @ramdhan1989

Using this dictionary you can build a model with this code:

from pyFTS.hyperparam import Evolutionary

model = Evolutionary.phenotype(
     dictionary,   #the result of the hyperparameter method
     train,            #The train dataset
     fts_method  #the FTS method
)

Best regards

ramdhan1989 commented 4 years ago

well thanks a lot @petroniocandido . does the hyperparams optimization search the best data transformation as well ? such as how many lags for differential ? or may be what kind of transformations is the best for the problem ?

thank you

ramdhan1989 commented 4 years ago

Hi @petroniocandido , how can I get stable prediction using GA ? every time I run it will result different values. do you have suggestion ?

ramdhan1989 commented 3 years ago

Hi @petroniocandido , I come back to try using this package. Just want to clarify several things :

  1. how to use Transformation differential into hyperparam optimization ?

  2. using evolutionary, I got rmse "nan". is it good ? image

  3. is it possible to use other eval metric ? such as rmsle (root mean sq log error) ?

appreciate for your answers

thank you