Closed pcannons closed 3 years ago
Welcome to Talos community! Thanks so much for creating your first issue :)
@pcannons sorry for the delay in getting back to this.
Could you share your parameter dictionary as a reference.
@mikkokotila No problem!
Yep, here it is:
p = {
'spm_epochs': [1000],
'batch_size': [64],
'truncated_batch_length': [150],
'variable_length_max_cutoff': [1500],
'embedding_tokens_to_keep': [4500],
'spm_learning_rate': [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1],
'keras_embed_dim': list(range(60,300,10)),
'action_input_embed_dims': [50],
'embedding_events_embedding_l1_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'embedding_events_embedding_l2_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'embedding_events_activity_l1_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'embedding_events_activity_l2_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'pre_embedding_events_token_index_dropout': [i/100 for i in list(range(0,100,5))],
'post_embedding_batch_normalization': [True, False],
'post_embedding_spatial_dropout': [i/100 for i in list(range(0,100,5))],
'post_embedding_timestep_dropout': [i/100 for i in list(range(0,100,5))],
'cnn_hidden_layers': [2], #list(range(1,10,1)),
'cnn_shape': ['brick', 'slope', 'funnel', 'triangle'],
'cnn_first_neuron': list(range(1,300,10)),
'cnn_last_neuron': list(range(1,300,10)),
'cnn_first_hidden_kernel_size': list(range(1,100,2)),
'cnn_last_hidden_kernel_size': list(range(1,100,2)),
'cnn_l1_kernel_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'cnn_l2_kernel_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'cnn_batch_norm_after_each_layer': [True, False],
'cnn_spatial_dropout': [i/100 for i in list(range(0,55,5))],
'cnn_timestep_dropout': [i/100 for i in list(range(0,55,5))],
'global_average_or_max': ['average', 'max', 'flatten'],
'output_softmax_dropout': [i/100 for i in list(range(0,55,5))],
'output_softmax_kernel_l1_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'output_softmax_kernel_l2_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'output_softmax_activity_l1_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'output_softmax_activity_l2_regularization': [1e-8, 5e-8, 1e-7, 5e-7, 1e-6, 5e-6, 1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3],
'stl_moment_layers': list(range(1,100)),
'STL_ENABLED': [False],
}
Also, my temporary fix above does not work as it causes the same parameter to be selected from same length lists. I think the solution would have to use a random number for i
in the divmod
command but I'm not sure how that affects repeatability.
I see that you are trying to input a parameter space of approximately 10^31 permutations. This is roughly 10^20 above the supported magnitude. When you run Scan()
with this parameter dictionary as an input, it should not run (unless you are running on a very special computer system).
I think it's a pretty standard deep learning build:
CPU: AMD Ryzen Threadripper 2950X 16-Core Processor
GPU: 4 2080TI GPUs
RAM: 64GB
Closing this, as it's unlikely to become a priority to address the inability to be able to do such large permutation spaces.
Feel free to open a new issue if anything.
Thank you very much for reporting a bug on Talos. Before you do, please go through the below checklist carefully and make sure to prepare your bug report in a way that facilitates effective handling of the matter.
1) Confirm the below
2) Include the output of:
talos.__version__
=0.6.4
3) Explain clearly what you expect to happen
I added 50+ parameters to sweep with round_params=10000 and expected it to work as usual.
4) Explain what actually happened
In ParamSpace.py on line 144:
The divmod eventually starts selecting the first element of each parameter because i goes to zero. I fixed this locally by simply doing: