how to use hyperparams - Githubissues

ramdhan1989 commented 4 years ago

I am struggle to find guidance about how to use hyperparam modul such as grid search or evolutionary. anyone can share ?

thank you

petroniocandido commented 4 years ago

Hi @ramdhan1989

Thanks for your interest in our tool, and forgive-me for the long delay.

First of all, before hyperparameter optimization (hereafter called hyperopt), you should perform the time series analysis (such as ACF/PACF plots, tests of stationarity and scedasticity, etc). Hyperopt does not unuseful to know how your time-series data behaves.

The hyperparameter optimization of FTS is described here, and is called DEHO - Distributed Evolutionary Hyperparameter Optimization, but there are other methods then evolutionary in the library. The return of the method will be a dictionary with the best parameters found for forecasting the dataset using the selected FTS method (in the parameter fts_method).

Below a list of the implemented methods:

Grid Search (GS) is very accurate but also very computationally expensive.

from pyFTS.hyperparam import GridSearch
from pyFTS.models import hofts
from pyFTS.data import TAIEX

datasetname = 'TAIEX'
dataset = TAIEX.get_data()

#The list of hyperparameters search spaces
hyperparams = {
    'order': [1, 2, 3],
    'partitions': np.arange(10,100,3),
    'partitioner': [1,2],   #GridSearch, EntropySearch, ...
    'mf': [1, 2, 3],    #Triangular, Trapezoidal and Gaussian
    'lags': np.arange(2, 7, 1),  # The lag indexes
    'alpha': np.arange(.0, .5, .05)  #Alpha Cut
}

GridSearch.execute(
        hyperparams,              #A dictionary containing the search spaces for each hyperparameter
        datsetname,                 #Just the name of your dataset 
        dataset,                        #Your time series data (list or np.ndarray 1d)
        fts_method=hofts.WeightedHighOrderFTS,     # the FTS method you want to optimize [only univariate methods]
        window_size=10000,    #The length of the data window for the Sliding Window Cross Validation method
        train_rate=.9,                #The proportion of the data window that will be used for training, the remaining will be used for test
        increment_rate=.3,       #The sliding increment the Sliding Window Cross Validation method
        database_file='hyperopt.db'   #A sqlite database that will contain the log of the hyperopt process
)

There is no GridSearch implementation yet for multivariate methods.

Random Search (RS) is computationally cheap but may not correctly converge, so it is not the more accurate method. Currently RS is implemented only for MVFTS.

from pyFTS.hyperparam import mvfts as deho_mv from pyFTS.models.multivariate import mvfts, wmvfts from pyFTS.models.seasonal.common import DateTime from pyFTS.data import Malaysia

dataset = Malaysia.get_dataframe() dataset['time'] = pd.to_datetime(data["time"], format='%m/%d/%y %I:%M %p')

explanatory_variables =[ {'name': 'Temperature', 'data_label': 'temperature', 'type': 'common'}, {'name': 'Daily', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.minute_of_day, 'npart': 24 }, {'name': 'Weekly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_week, 'npart': 7 }, {'name': 'Monthly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_month, 'npart': 4 }, {'name': 'Yearly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_year, 'npart': 12 } ]

target_variable = {'name': 'Load', 'data_label': 'load', 'type': 'common'}

deho_mv.random_search( datsetname, #Just the name of your dataset dataset, #Your time series data (pd.DataFrame) npop=200, #Size of population of the RS mgen=70, #Number of iterations of the RS fts_method=wmvfts.WeightedMVFTS, #The multivariate FTS method to optimize variables=explanatory_variables, #The list of exogenous/explanatory variables target_variable=target_variable, #The endogenous/target variable window_size=10000, #The length of the data window for the Sliding Window Cross Validation method train_rate=.9, #The proportion of the data window that will be used for training, the remaining will be used for test increment_rate=.3, #The sliding increment the Sliding Window Cross Validation method )


- **Genetic Algorithm (GA)** is between GS and RS, both in accuracy and computational cost.

from pyFTS.hyperparam import Evolutionary from pyFTS.models import hofts from pyFTS.data import TAIEX

datasetname = 'TAIEX' dataset = TAIEX.get_data()

ret = Evolutionary.execute( datsetname, #Just the name of your dataset dataset, #Your time series data (list or np.ndarray 1d) fts_method=hofts.WeightedHighOrderFTS, # the FTS method you want to optimize [only univariate methods] ngen=30, #Number of generations, the number of iterations of the GA npop=20, #The size of population of the GA psel=0.6, #Probability of selection of the GA pcross=.5, #Probability of crossover of the GA pmut=.3, #Probability of mutation of the GA window_size=10000, #The length of the data window for the Sliding Window Cross Validation method train_rate=.9, #The proportion of the data window that will be used for training, the remaining will be used for test increment_rate=.3, #The sliding increment the Sliding Window Cross Validation method experiments=1, #Number of hyperopt experiments to perform database_file='hyperopt.db' #A sqlite database that will contain the log of the hyperopt process )



Please, do not hesitate to get in touch if you have any questions.

Best regards

ramdhan1989 commented 4 years ago

Thanks, all those three method work ! after executing hyperparameter optimization, does the model fitted automatically using the best params ? or we need to take the value from the output dict and fit the model ? Would you mind elaborating more about the dict ? I am confused the values belong to which parameter ? from your code using GA : Experiment 0 Evaluating initial population 1600098526.9596627 GENERATION 0 1600098526.9596627 WITHOUT IMPROVEMENT 1 GENERATION 1 1600098526.9606583 WITHOUT IMPROVEMENT 2 GENERATION 2 1600098526.9626496 WITHOUT IMPROVEMENT 3 GENERATION 3 1600098526.963645 WITHOUT IMPROVEMENT 4 GENERATION 4 1600098526.9656367 WITHOUT IMPROVEMENT 5 GENERATION 5 1600098526.9666321 WITHOUT IMPROVEMENT 6 GENERATION 6 1600098526.9686234 WITHOUT IMPROVEMENT 7 ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'rmse', inf) ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'size', inf) ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'time', 0.010952949523925781)

below is the return dict : {'alpha': 0.5, 'f1': inf, 'f2': inf, 'lags': [2, 6, 7], 'mf': 1, 'npart': 40, 'order': 3, 'partitioner': 2, 'rmse': inf, 'size': inf, 'time': 0.010952949523925781}

petroniocandido commented 4 years ago

Hi @ramdhan1989

Using this dictionary you can build a model with this code:

from pyFTS.hyperparam import Evolutionary

model = Evolutionary.phenotype(
     dictionary,   #the result of the hyperparameter method
     train,            #The train dataset
     fts_method  #the FTS method
)

Best regards

ramdhan1989 commented 4 years ago

well thanks a lot @petroniocandido . does the hyperparams optimization search the best data transformation as well ? such as how many lags for differential ? or may be what kind of transformations is the best for the problem ?

thank you

ramdhan1989 commented 4 years ago

Hi @petroniocandido , how can I get stable prediction using GA ? every time I run it will result different values. do you have suggestion ?

ramdhan1989 commented 3 years ago

Hi @petroniocandido , I come back to try using this package. Just want to clarify several things :

how to use Transformation differential into hyperparam optimization ?
using evolutionary, I got rmse "nan". is it good ?
is it possible to use other eval metric ? such as rmsle (root mean sq log error) ?

appreciate for your answers

thank you

PYFTS / pyFTS

how to use hyperparams #30