The starting value of the loss function depends on batch size (EDDL 0.8)

giobus75 commented 3 years ago

Hi, I'm training a VGG16 with a huge dataset and by using a soft_crossentropy loss function and a rmsprop optimizer. With the EDDL version 0.8.3a I'm facing a different behavior with respect to the 0.7.1 version: the starting point of the loss function changes at different batch sizes apparently by the same factor. Moreover, the loss value does not decrease across epochs, or at least, it is very slow. In the following graph, I plotted the first 10 training epochs for Keras, EDDL 7, and EDDL 8. The configuration was: learning rate 1e-6, no augmentation, initialization of the convolutional part with the Imagenet weights. The black line refers to Keras and the loss starting value is comparable with the one of the EDDL 7 (red line). I did not plot different batch sizes for both Keras and EDDL 7 because the starting point differences are negligible. The odd behavior can be seen in the EDDL 8 plots (the lines with solid circle marker). As the batch size increases, the starting point of the loss function decreases. Another issue, maybe related to, is that the training does not work. In the Keras version the training, without using any technique to avoid overfitting, goes to high values of accuracy (about 99% after about 50 epochs). Could you check, please, what could be wrong? Thank you Giovanni

RParedesPalacios commented 3 years ago

Hi, thanks for pointing this out. We will check it. I think that @salvacarrion changed how the soft_cross_entropy is computed. Perhaps he didn't consider the batch size. I will contact him.

salvacarrion commented 3 years ago

Thank you for pointing this out! I'll fix it asap

ps.: I need to take a closer look at the problem but the CE loss functions* seem correct (apparently) since they're normalized by the batch number.

*categorical_cross_entropy, binary_cross_entropy, and softmax_cross_entropy

RParedesPalacios commented 3 years ago

Hi, it is now solved in develop branch

giobus75 commented 3 years ago

Thank you @RParedesPalacios, I'll check it as soon as possible.

Giovanni

deephealthproject / eddl

The starting value of the loss function depends on batch size (EDDL 0.8) #237