deephealthproject / eddl

European Distributed Deep Learning (EDDL) library. A general-purpose library initially developed to cover deep learning needs in healthcare use cases within the DeepHealth project.
https://deephealthproject.github.io/eddl/
MIT License
34 stars 10 forks source link

Memory not released after calls to predict #236

Closed simleo closed 3 years ago

simleo commented 3 years ago

CC: @mdrio @giobus75

Short summary: calling predict with tensors of increasing size leads to memory errors even though predicting with the largest size works, suggesting an unreleased memory problem.

The following program results in increasing memory occupation that leads to an out of memory error:

#include "eddl/apis/eddl.h"

using namespace eddl;

layer create_VGG16(layer x, const int& num_classes) {
    x = ReLu(HeNormal(Conv(x, 64, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 64, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = ReLu(HeNormal(Conv(x, 128, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 128, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = ReLu(HeNormal(Conv(x, 256, {3, 3}), 1234));
    x = ReLu(HeNormal(Conv(x, 256, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 256, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234));
    x = ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234));
    x = ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234));
    x = MaxPool(ReLu(HeNormal(Conv(x, 512, {3, 3}), 1234)), {2, 2}, {2, 2});
    x = Reshape(x, {-1});
    x = ReLu(HeNormal(Dense(x, 256), 1234));
    x = Softmax(Dense(x, num_classes));
    return x;
}

int main() {
    std::vector<int> in_size{256, 256};
    string weight_filename("classify_tumor_eddl_0.1.bin");
    int num_classes = 2;

    layer in_ = Input({3, in_size[0], in_size[1]});
    layer out = create_VGG16(in_, num_classes);
    model net = Model({in_}, {out});
    build(net, rmsprop(0.00001), {"soft_cross_entropy"},
          {"categorical_accuracy"}, CS_GPU({1}, "low_mem"));
    load(net, weight_filename, "bin");

    for (int i = 1; i < 15; ++i) {
        int bs = i;
        // int bs = 14;
        Tensor* tin = new Tensor({bs, 3, in_size[0], in_size[1]}); // channel first
        vtensor tout = predict(net, {tin});
    }
}

Output:

Predict 1 samples
Predict 2 samples
Predict 3 samples
Predict 4 samples
Predict 5 samples
Predict 6 samples
Predict 7 samples
Predict 8 samples
Predict 9 samples
Predict 10 samples
Predict 11 samples
Predict 12 samples
terminate called after throwing an instance of 'std::runtime_error'
  what():  [CUDA ERROR]: out of memory (2) raised in create_tensor | (check_cuda)
Aborted (core dumped)

As shown above, the out of memory error happens at bs = 12. However, the problem is not that the prediction with size 12 does not fit. If the code is changed to keep the bs constant at size 14, the memory error does not happen.

Change:

        // int bs = i;
        int bs = 14;

Output:

Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples
Predict 14 samples

This suggets that, with varying bs, memory from the smaller allocations is not released, so that when an allocation with size 12 is tried, part of the memory is already occupied and it fails, while this does not happen with a constant size.

I've also tried replicating this on the CPU, where the behavior is consistent with the above. For instance, with 6 iterations, if the block size is kept constant at 6, the final memory footprint of the process is 4 GB (and it does not change with subsequent iterations); if, on the other hand, the block size increases from 1 to 6, the final memory footprint is 6 GB (increasing from the first iteration to the last).

The Python version of the program also behaves in the same way. The only difference is the absence, for some reason, of the "Aborted (core dumped)" message in the output.

Note: the weights file used while debugging can be downloaded from https://cloud.crs4.it/s/QbHtxSGACCetsew

RParedesPalacios commented 3 years ago

Ok i will check

RParedesPalacios commented 3 years ago

Solved in develop branch

simleo commented 3 years ago

Solved in develop branch

Which commit?

simleo commented 3 years ago

@RParedesPalacios Was it fixed in https://github.com/deephealthproject/eddl/commit/19a50a612401313a43ba7255b1051e2aaacb251c or does one need to update to https://github.com/deephealthproject/eddl/commit/494630f04aac2cb75e0e947b4530b3630ee68172?

RParedesPalacios commented 3 years ago

The last