Network architecture dependence

everthemore commented 3 years ago

A simple feedforward network with 10 -> 100 -> 1 structure does a really good job already, so we don't even have to try more complicated network types. But we should:

See how low we can go with the first hidden layer (e.g. train using Early Stopping, and try 10, 20, 30, ..., 100 neurons).
See if adding another hidden layer makes things even better (or if it just makes training harder).

Torbjorn-Rasmussen commented 3 years ago

So i tried both of these tests, in summary:

the performance of the network is decent for # of hidden noddes down to 20. a second hidden layer improved the network quite a lot. without further training because the early stopping criterion is reached.

for testing the # of hidden nodes: I used EarlyStopping from keras, monitoring val_loss and patience=50, running 1000 epochs typically did not make it stop early. These are plots of the loss as a function of hidden nodes, and a typical run. hidden_nodes_test one_hiddel_layer

For testing a second hidden layer, i made hidden layer equivalent to the first and trained on that with the same early stopping and epochs, it stopped early in most cases actually reducing the time spent training compared to single hidden layer. And simultaneously improved the performance on the test data. this is a typical run of 2 hidden layers. two_hidden_layer

everthemore commented 3 years ago

Great, so two layers with the same number of neurons achieves a lower loss, better generalization and even trains faster? :)

Torbjorn-Rasmussen commented 3 years ago

i think the improvement in training time is not that much, but going from 200s to 120s is still an improvement.

condensedAI / KwantRL

Network architecture dependence #1