autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 270 forks source link

The order of the output CSV seems to be wrong #439

Closed hep-beginner closed 4 years ago

hep-beginner commented 4 years ago

Hello, I would like to report a issue I found with the talos package output of the CSV with my current solution. My python version is python 3.5.2 (default, Oct 8 2019, 13:06:37) on the WSL and the keras is together with TF2.0 in which the model itself works. My talos.__version__ prints '0.6.3' and it is installed via pip. I ran the code converted from the example:

import tensorflow as tf
from tensorflow import keras
import talos

x, y = talos.templates.datasets.iris()
train_num = int(len(x)*0.8)
train_fea = x[:train_num]
train_label = y[:train_num]
val_fea = x[train_num:]
val_label = y[train_num:]

def iris_model(x_train, y_train, x_val, y_val, params):
    model = tf.keras.models.Sequential()                            
    model.add(tf.keras.layers.Dense(params['first_neuron'], input_dim=x_train.shape[1], activation='relu'))

    model.add(tf.keras.layers.Dropout(params['dropout']))
    model.add(tf.keras.layers.Dense(y_train.shape[1], activation=params['last_activation']))

    model.compile(optimizer=params['optimizer'](lr=lr_normalizer(params['lr'], params['optimizer'])), loss=params['loss'],
                  metrics=['acc'])

    out = model.fit(x_train, y_train,
                    batch_size=params['batch_size'],
                    epochs=params['epochs'],
                    verbose=0,
                    validation_data=[x_val, y_val])

    return out, model

from talos.utils import lr_normalizer
from tensorflow.keras.optimizers import Adam, Nadam
from tensorflow.keras.activations import softmax
from tensorflow.keras.losses import categorical_crossentropy, logcosh

p = {
     'lr': (0.1, 10, 10),
     'first_neuron':[4, 8, 16, 32, 64, 128],
     'batch_size': [2, 3, 4],
     'epochs': [200],
     'dropout': (0, 0.40, 10),
     'optimizer': [Adam, Nadam],
     'loss': ['categorical_crossentropy'],
     'last_activation': ['softmax'],
     'weight_regulizer': [None]
     }

scan_object = talos.Scan(x=train_fea,
                         y=train_label,
                         x_val=val_fea,
                         y_val=val_label,
                         params=p,
                         model=iris_model,
                         fraction_limit=.001,
                         experiment_name='iris',
                         print_params=True)

scan_object.details

The output CSV looks like

round_epochs loss val_loss acc val_acc batch_size dropout epochs first_neuron last_activation loss lr optimizer weight_regulizer
200 1.916683 2.028089 0.35 0.266667 7.03 categorical_crossentropy <class 'tensorflow.python.keras.optimizer_v2.nadam.Nadam'> 200 None 32 softmax 0.28 3
200 3.511631 1.842899 0.291667 0.266667 6.04 categorical_crossentropy <class 'tensorflow.python.keras.optimizer_v2.adam.Adam'> 200 None 16 softmax 0.04 2
200 1.131929 1.112004 0.308333 0.266667 0.1 categorical_crossentropy <class 'tensorflow.python.keras.optimizer_v2.adam.Adam'> 200 None 16 softmax 0.36 2

for an example. During differenc trials, the titles always keep the same and the values of the parameters part can go wrong in arbitrary orders. After I investigate the code, I have found that in the talos/logging/results.py L21, the looping takes the key of self.round_params and this is a dictionary defined in talos/parameters/ParamSpace.py L187. Changing the looping item into the sorted fixed order list object self._param_dict_keys can solve the problem. According to some brief searching, it seems keep the default order of dictionary is a feature added in python 3.6 and I tested in my python 3.5.2 that it was really the case that the dictionary entries order is not fixed to the default. I hope this change do not cause any potential risks to the packages. Thanks a lot.

github-actions[bot] commented 4 years ago

Welcome to Talos community! Thanks so much for creating your first issue :)

mikkokotila commented 4 years ago

Thanks, good sleuthing :) This is not so straightforward matter unfortunately. It was handled in this commit by using OrderedDict at the earliest point. Will have to think a little bit, but might be resolved by bringing it back.

hep-beginner commented 4 years ago

Dear mikkokotila, Many thanks for the reply. Sorry for my too simple thinking. I decide to move to python 3.6 and that version works well with talos. This is a very nice and easy to use package.

mikkokotila commented 4 years ago

Closing here as its resolved.

denis-sumin commented 4 years ago

The issue is still relevant, unfortunately. I tried to wrap my params into an OrderedDict, but the order of columns is still incorrect (python 3.5). It seems that talos doesn't support python3.5 at this point which is sad...

denis-sumin commented 4 years ago

It it helps, there is the following observation:

If I print a params object from my one-train (one-round) function, the order of params matches the order of the columns in the resulted scan object.

image

The order of columns, on the other hand, is alphabetical.

The root of this problem is, naturally, the fact that in python <=3.5 the order of dicts is arbitrary while starting from 3.6 the insertion order is preserved (starting from 3.7 it is in the standard).