deephealthproject / eddl

European Distributed Deep Learning (EDDL) library. A general-purpose library initially developed to cover deep learning needs in healthcare use cases within the DeepHealth project.
https://deephealthproject.github.io/eddl/
MIT License
34 stars 10 forks source link

Memory not released after deleting model #238

Closed simleo closed 3 years ago

simleo commented 3 years ago

CC: @mdrio

The following program results in continuously increasing memory occupation:

#include "eddl/apis/eddl.h"

using namespace eddl;

layer create_VGG16(layer x, const int& num_classes) {
    x = ReLu(HeNormal(Conv(x, 64, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 64, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = ReLu(HeNormal(Conv(x, 128, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 128, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = ReLu(HeNormal(Conv(x, 256, {3, 3}), 1234));
    x = ReLu(HeNormal(Conv(x, 256, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 256, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234));
    x = ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234));
    x = ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = Reshape(x, {-1});
    x = ReLu(HeNormal(Dense(x, 256), 1234));
    x = Softmax(Dense(x, num_classes));
    return x;
}

int main() {
    int c = 0;
    std::vector<int> in_size{256, 256};
    int num_classes = 2;
    while (true) {
    std::cerr << "#iter: " << c << "\n";
    layer in_ = Input({3, in_size[0], in_size[1]});
    layer out = create_VGG16(in_, num_classes);
    model net = Model({in_}, {out});
    build(net, rmsprop(0.00001), {"soft_cross_entropy"},
          {"categorical_accuracy"}, CS_CPU());
        std::this_thread::sleep_for(std::chrono::seconds(1));
        c++;
    delete net;
    }
}

As the program runs, memory occupation keeps increasing (about 6, 12, 18, ... GB at iteration 10, 20, 30, ...). This was also observed on the GPU, but in this case it also crashed at iteration 16 with a memory error:

#iter: 16
CS with low memory setup
Building model
terminate called after throwing an instance of 'std::runtime_error'
  what():  [CUDA ERROR]: out of memory (2) raised in create_tensor | (check_cuda)
Aborted (core dumped)
RParedesPalacios commented 3 years ago

The fix to the previous issue also solves this. Also in develop branch.

simleo commented 3 years ago

@RParedesPalacios I've checked with EDDL at https://github.com/deephealthproject/eddl/commit/494630f04aac2cb75e0e947b4530b3630ee68172. The previous issue was solved, but this one was not. I still see the same behavior described above.

RParedesPalacios commented 3 years ago

Wow... ok i will check that, thanks! Today i have been working on this. I have also delete the optimizer that also gets memory. Perhaps now it is solved (develop) in any case i will check tomorrow.

simleo commented 3 years ago

@RParedesPalacios I checked again with the current develop. Memory still grows, albeit a bit less. Now it's about 5, 10, 15, ... GB at iteration 10, 20, 30, ...

RParedesPalacios commented 3 years ago

Not sure what happens, in my side the memory is completely stable, after 30 iterations it is the same that in the first iteration, 904MB:

|===============================+======================+======================| | 0 GeForce GTX 1080 Off | 00000000:03:00.0 On | N/A | | 0% 48C P8 15W / 200W | 904MiB / 8118MiB | 19% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

RParedesPalacios commented 3 years ago

Ok i do see.... it seems that it is CPU memory.... ok very strange... i will check

RParedesPalacios commented 3 years ago

solved

simleo commented 3 years ago

@RParedesPalacios I checked on 280795c942e5dd5a62d79a2968bbac84a5d2f2c4 and it's fixed. Unfortunately, now there are segmentation fault problems, see https://github.com/deephealthproject/eddl/issues/241.