Feedforward neural network model optimization

afonsocastro commented 2 years ago

Optimization 1

Searched combinations: n_hidden_layers = 2 or 3 neurons_in_layer = 32 or 64 dropout = 0 or 0.2 activation_function = relu or sigmoid or selu model_optimizer = Adam or SGD or RMSprop

Fixed values: learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = _sparse_categoricalcrossentropy kernel_regularizer_per_layer = L1 early_stopping = _valloss ; patience = 20

model configurations: 5616 searched time: 9h16m36s

BEST Result

Epoch 300/300: loss: 0.4321 - accuracy: 0.9745 val_loss: 0.4490 - val_accuracy: 0.9583

Using 784 samples for training and 336 for validation

10 best results (optimization)

vitoruapt commented 2 years ago

The curves seem convincing! Can we conclude that we have reached a good architecture? The val_accuracy is nice (~96%), although I would expect a little better loss (a bit smaller wouldn't harm). What troubles me is that among the 10 best optimization trials, we have similar accuracies for relatively different architectures, both with 2 and 3 hidden layers, and with quite disparate number of epochs (184 -- 300). Perhaps the problem is not that hard, and there are many ways to solve it. I say, let's expand the dataset to other users and test this network with it!

afonsocastro commented 2 years ago

Optimization 2

Searched combinations:

neurons_in_layer = 16 or 32 or 64 or 128 dropout = 0 or 0.2 activation_function = relu or selu

Fixed values: n_hidden_layers = 3 model_optimizer = Adam learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = _sparse_categoricalcrossentropy kernel_regularizer_per_layer = L1 early_stopping = _valloss ; patience = 20

model configurations: 4096 searched time: 6h52m26s

BEST Result

Epoch 300/300: loss: 0.4050 - accuracy: 0.9834 val_loss: 0.4435 - val_accuracy: 0.9554

Using 784 samples for training and 336 for validation

10 best results (optimization)

afonsocastro commented 2 years ago

The curves seem convincing! Can we conclude that we have reached a good architecture? The val_accuracy is nice (~96%), although I would expect a little better loss (a bit smaller wouldn't harm). What troubles me is that among the 10 best optimization trials, we have similar accuracies for relatively different architectures, both with 2 and 3 hidden layers, and with quite disparate number of epochs (184 -- 300). Perhaps the problem is not that hard, and there are many ways to solve it. I say, let's expand the dataset to other users and test this network with it!

From the first optimization, I can say that the only reliable architecture with only 2 hidden layers, is the one with the dropout of 20% at the second layer. The other 2-layer architecture EarlyStopped at the 184 epoch, which I do not find trustworthy.

Fact: On the 2 groups of 10 best results of both optimizations, we never saw any configuration with dropout in more than just one layer. Either there is a dropout on only one layer, or there is no dropout at all.

Perhaps the problem is not that hard, and there are many ways to solve it.

I completely agree with it, since we outcome several acceptable model configurations with any value of neurons (16, 32, 64 or 128) and for both activation functions (relu and selu), with or without dropout.

However, we are going to create a new dataset, in which the joints efforts are not included any more. This means that instead of having several groups of 13 values (timestamp + 6 fist force/torque + 6 joints torques), we will now have several groups of only 7 values (timestamp + 6 fist force/torque). This probably will bring some differences to the new optimal neural network architecture.

afonsocastro commented 2 years ago

Optimization 3

Searched combinations: n_hidden_layers = 3 or 4 neurons_in_layer = 64 or 128 dropout = 0 or 0.5 activation_function = relu or selu

Fixed values: model_optimizer = Adam learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = _sparse_categoricalcrossentropy kernel_regularizer_per_layer = L1 early_stopping = _valloss ; patience = 20

model configurations: 4608 searched time: 7h14m17s

BEST Result

Epoch 249/300: loss: 0.5427 - accuracy: 0.9566 val_loss: 0.5808 - val_accuracy: 0.9405

Using 784 samples for training and 336 for validation

10 best results (optimization)

afonsocastro commented 2 years ago

The last Optimization (Optimization 3) was the only one where the best configuration could never achieve the 300 epoch.

With all these different studies, I will propose the following conjectures:

the model architecture should have 3 hidden layers
the dropout option should be only available on the 3rd (last) hidden layer
the neurons activation function should be relu or selu

I will try a new optimization with this information in mind.

afonsocastro commented 2 years ago

Optimization 4

Searched combinations: neurons_in_layer = 16 or 32 or 64 or 128 dropout = 0 or 0.2 or 0.5 activation_function = relu or selu

Fixed values: n_hidden_layers = 3 model_optimizer = Adam learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = _sparse_categoricalcrossentropy kernel_regularizer_per_layer = L1 early_stopping = _valloss ; patience = 20

model configurations: 1536 searched time: 2h34m41s

BEST Result

Epoch 300/300: loss: 0.4666 - accuracy: 0.9745 val_loss: 0.5030 - val_accuracy: 0.9524

Using 784 samples for training and 336 for validation

10 best results (optimization)

afonsocastro commented 2 years ago

I will now run the following optimization:

Searched combinations: n_hidden_layers = 1 or 2 or 3 neurons_in_layer = 16 or 32 or 64 dropout (on last layer) = 0 or 0.2 or 0.5 learning_rate = 0.01 or 0.001 or 0.0001 activation_function = relu or selu or softsign or tanh

Fixed values: model_optimizer = Adam max_epochs = 500 batch_size = 96 model_loss = sparse_categorical_crossentropy kernel_regularizer_per_layer = L1 early_stopping = val_loss ; patience = 20

Note: I will run each different architecture 3 times, and extract the mean of the 3 values of val_accuracy and last_epoch: model configurations: 16956 number of trained networks (tests): 50868 = 16956 x 3

afonsocastro commented 2 years ago

With the new dataset collected on 1 August 2022 (see this post), I just optimized some parameters to achieve a proper and valid neural network.

On a Universe of 2558 samples (640 pull, 642 push, 632 shake, 644 twist). Where 1790 for training (1253 train, 537 validation) and 768 for testing.

Optimization 5

Searched combinations: n_hidden_layers = 2 or 3 neurons_in_layer = 32 or 64 dropout = 0.2 or 0.5 activation_function = relu or selu

Fixed values: model_optimizer = Adam learning_rate = 0.001 max_epochs = 300 batch_size = 96 model_loss = sparse_categorical_crossentropy kernel_regularizer_per_layer = L1 early_stopping = val_loss ; patience = 30

model configurations: 160 searched time: 16m55s