Open afonsocastro opened 2 years ago
The curves seem convincing! Can we conclude that we have reached a good architecture? The val_accuracy is nice (~96%), although I would expect a little better loss (a bit smaller wouldn't harm). What troubles me is that among the 10 best optimization trials, we have similar accuracies for relatively different architectures, both with 2 and 3 hidden layers, and with quite disparate number of epochs (184 -- 300). Perhaps the problem is not that hard, and there are many ways to solve it. I say, let's expand the dataset to other users and test this network with it!
Searched combinations:
neurons_in_layer = 16 or 32 or 64 or 128 dropout = 0 or 0.2 activation_function = relu or selu
Fixed values: n_hidden_layers = 3 model_optimizer = Adam learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = _sparse_categoricalcrossentropy kernel_regularizer_per_layer = L1 early_stopping = _valloss ; patience = 20
model configurations: 4096 searched time: 6h52m26s
Epoch 300/300: loss: 0.4050 - accuracy: 0.9834 val_loss: 0.4435 - val_accuracy: 0.9554
Using 784 samples for training and 336 for validation
10 best results (optimization)
The curves seem convincing! Can we conclude that we have reached a good architecture? The val_accuracy is nice (~96%), although I would expect a little better loss (a bit smaller wouldn't harm). What troubles me is that among the 10 best optimization trials, we have similar accuracies for relatively different architectures, both with 2 and 3 hidden layers, and with quite disparate number of epochs (184 -- 300). Perhaps the problem is not that hard, and there are many ways to solve it. I say, let's expand the dataset to other users and test this network with it!
From the first optimization, I can say that the only reliable architecture with only 2 hidden layers, is the one with the dropout of 20% at the second layer. The other 2-layer architecture EarlyStopped at the 184 epoch, which I do not find trustworthy.
Fact: On the 2 groups of 10 best results of both optimizations, we never saw any configuration with dropout in more than just one layer. Either there is a dropout on only one layer, or there is no dropout at all.
Perhaps the problem is not that hard, and there are many ways to solve it.
I completely agree with it, since we outcome several acceptable model configurations with any value of neurons (16, 32, 64 or 128) and for both activation functions (relu and selu), with or without dropout.
However, we are going to create a new dataset, in which the joints efforts are not included any more. This means that instead of having several groups of 13 values (timestamp + 6 fist force/torque + 6 joints torques), we will now have several groups of only 7 values (timestamp + 6 fist force/torque). This probably will bring some differences to the new optimal neural network architecture.
Searched combinations: n_hidden_layers = 3 or 4 neurons_in_layer = 64 or 128 dropout = 0 or 0.5 activation_function = relu or selu
Fixed values: model_optimizer = Adam learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = _sparse_categoricalcrossentropy kernel_regularizer_per_layer = L1 early_stopping = _valloss ; patience = 20
model configurations: 4608 searched time: 7h14m17s
Epoch 249/300: loss: 0.5427 - accuracy: 0.9566 val_loss: 0.5808 - val_accuracy: 0.9405
Using 784 samples for training and 336 for validation
10 best results (optimization)
The last Optimization (Optimization 3) was the only one where the best configuration could never achieve the 300 epoch.
With all these different studies, I will propose the following conjectures:
I will try a new optimization with this information in mind.
Searched combinations: neurons_in_layer = 16 or 32 or 64 or 128 dropout = 0 or 0.2 or 0.5 activation_function = relu or selu
Fixed values: n_hidden_layers = 3 model_optimizer = Adam learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = _sparse_categoricalcrossentropy kernel_regularizer_per_layer = L1 early_stopping = _valloss ; patience = 20
model configurations: 1536 searched time: 2h34m41s
Epoch 300/300: loss: 0.4666 - accuracy: 0.9745 val_loss: 0.5030 - val_accuracy: 0.9524
Using 784 samples for training and 336 for validation
10 best results (optimization)
I will now run the following optimization:
Searched combinations: n_hidden_layers = 1 or 2 or 3 neurons_in_layer = 16 or 32 or 64 dropout (on last layer) = 0 or 0.2 or 0.5 learning_rate = 0.01 or 0.001 or 0.0001 activation_function = relu or selu or softsign or tanh
Fixed values: model_optimizer = Adam max_epochs = 500 batch_size = 96 model_loss = sparse_categorical_crossentropy kernel_regularizer_per_layer = L1 early_stopping = val_loss ; patience = 20
Note: I will run each different architecture 3 times, and extract the mean of the 3 values of val_accuracy and last_epoch: model configurations: 16956 number of trained networks (tests): 50868 = 16956 x 3
With the new dataset collected on 1 August 2022 (see this post), I just optimized some parameters to achieve a proper and valid neural network.
On a Universe of 2558 samples (640 pull, 642 push, 632 shake, 644 twist). Where 1790 for training (1253 train, 537 validation) and 768 for testing.
Searched combinations: n_hidden_layers = 2 or 3 neurons_in_layer = 32 or 64 dropout = 0.2 or 0.5 activation_function = relu or selu
Fixed values: model_optimizer = Adam learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = sparse_categorical_crossentropy kernel_regularizer_per_layer = L1 early_stopping = val_loss ; patience = 30
model configurations: 160 searched time: 16m55s
Epoch 218/300: loss: 0.7502 - accuracy: 0.8859 val_loss: 0.7890 - val_accuracy: 0.8566
Using 1253 samples for training and 537 for validation
Using 768 new samples for predicting
10 best results (optimization)
Optimization 1
Searched combinations: n_hidden_layers = 2 or 3 neurons_in_layer = 32 or 64 dropout = 0 or 0.2 activation_function = relu or sigmoid or selu model_optimizer = Adam or SGD or RMSprop
Fixed values: learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = _sparse_categoricalcrossentropy kernel_regularizer_per_layer = L1 early_stopping = _valloss ; patience = 20
model configurations: 5616 searched time: 9h16m36s
BEST Result
Epoch 300/300: loss: 0.4321 - accuracy: 0.9745 val_loss: 0.4490 - val_accuracy: 0.9583
Using 784 samples for training and 336 for validation
10 best results (optimization)