deephealthproject / eddl

European Distributed Deep Learning (EDDL) library. A general-purpose library initially developed to cover deep learning needs in healthcare use cases within the DeepHealth project.
https://deephealthproject.github.io/eddl/
MIT License
34 stars 10 forks source link

Strange loss behavior #225

Closed giobus75 closed 3 years ago

giobus75 commented 3 years ago

Hi, I'm using the pyeddl bindings (eddl v0.8a) with GPU and I'm experiencing a strange behavior of the loss function. Starting my classification application (https://github.com/deephealthproject/promort_pipeline/blob/master/python/promort_cassandra.py) I suddenly get a very low value for the loss function (with respect to the usual one I got with the previous eddl versions). I don't know if it can be useful but after investigating a bit, I found out that the behavior changes with the batch size (bs). It seems that by increasing the bs, the loss function scales down with about the same factor. As an example, in the following figures, the output after about 200 iterations (epoch 0) for bs=1, 2, 4, 8, 16

bs1

bs2

bs4

bs8

bs16

Using the v0.7.1 I got a loss function value of about 1.5 after the same number of iterations and it is not so sensitive to the batch size.

I tried also a C++ eddl example (mnist_conv) by using both eddl v0.7.1 and v0.8a and I found out that the output of the program is very different:

here the output of v0.7.1: mnist_conv_v0 7 1

this one with the v0.8a: mnist_conv_v0 8a

As you can see, in the v0.8a at the end of the first epoch the loss function shows a very small value and turns to a nan value at the end of the computation with a collapse of the accuracy.

In this case, I did not make experiments changing the batch size and I always used the GPU only. Hope it can help. Giovanni

salvacarrion commented 3 years ago

Fixed #228