autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 270 forks source link

After 25 parameters, ParamSpace picks the first value of each one #433

Closed pcannons closed 3 years ago

pcannons commented 4 years ago

Thank you very much for reporting a bug on Talos. Before you do, please go through the below checklist carefully and make sure to prepare your bug report in a way that facilitates effective handling of the matter.

1) Confirm the below

2) Include the output of:

talos.__version__ = 0.6.4

3) Explain clearly what you expect to happen

I added 50+ parameters to sweep with round_params=10000 and expected it to work as usual.

4) Explain what actually happened

In ParamSpace.py on line 144:

for i in self.param_index:
    p = []
    for l in reversed(self._params_temp):
        i, s = divmod(int(i), len(l))
        p.insert(0, l[s])
    final_grid.append(tuple(p))

The divmod eventually starts selecting the first element of each parameter because i goes to zero. I fixed this locally by simply doing:

_, s = divmod(int(i), len(l))
github-actions[bot] commented 4 years ago

Welcome to Talos community! Thanks so much for creating your first issue :)

mikkokotila commented 4 years ago

@pcannons sorry for the delay in getting back to this.

Could you share your parameter dictionary as a reference.

pcannons commented 4 years ago

@mikkokotila No problem!

Yep, here it is:

p = {
    'spm_epochs': [1000],
    'batch_size': [64],
    'truncated_batch_length': [150],
    'variable_length_max_cutoff': [1500],
    'embedding_tokens_to_keep': [4500],
    'spm_learning_rate': [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1],
    'keras_embed_dim': list(range(60,300,10)),
    'action_input_embed_dims': [50],

    'embedding_events_embedding_l1_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'embedding_events_embedding_l2_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'embedding_events_activity_l1_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'embedding_events_activity_l2_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'pre_embedding_events_token_index_dropout': [i/100 for i in list(range(0,100,5))],
    'post_embedding_batch_normalization': [True, False],
    'post_embedding_spatial_dropout': [i/100 for i in list(range(0,100,5))],
    'post_embedding_timestep_dropout': [i/100 for i in list(range(0,100,5))],

    'cnn_hidden_layers': [2], #list(range(1,10,1)),
    'cnn_shape': ['brick', 'slope', 'funnel', 'triangle'],
    'cnn_first_neuron':  list(range(1,300,10)),
    'cnn_last_neuron':  list(range(1,300,10)),
    'cnn_first_hidden_kernel_size':  list(range(1,100,2)),
    'cnn_last_hidden_kernel_size':  list(range(1,100,2)),
    'cnn_l1_kernel_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'cnn_l2_kernel_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'cnn_batch_norm_after_each_layer': [True, False],
    'cnn_spatial_dropout': [i/100 for i in list(range(0,55,5))],
    'cnn_timestep_dropout': [i/100 for i in list(range(0,55,5))],

    'global_average_or_max': ['average', 'max', 'flatten'],
    'output_softmax_dropout': [i/100 for i in list(range(0,55,5))],
    'output_softmax_kernel_l1_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'output_softmax_kernel_l2_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'output_softmax_activity_l1_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
    'output_softmax_activity_l2_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],

    'stl_moment_layers': list(range(1,100)),
    'STL_ENABLED': [False],
}

Also, my temporary fix above does not work as it causes the same parameter to be selected from same length lists. I think the solution would have to use a random number for i in the divmod command but I'm not sure how that affects repeatability.

mikkokotila commented 4 years ago

I see that you are trying to input a parameter space of approximately 10^31 permutations. This is roughly 10^20 above the supported magnitude. When you run Scan() with this parameter dictionary as an input, it should not run (unless you are running on a very special computer system).

pcannons commented 4 years ago

I think it's a pretty standard deep learning build:

CPU: AMD Ryzen Threadripper 2950X 16-Core Processor
GPU: 4 2080TI GPUs
RAM: 64GB
mikkokotila commented 3 years ago

Closing this, as it's unlikely to become a priority to address the inability to be able to do such large permutation spaces.

Feel free to open a new issue if anything.