Closed giobus75 closed 3 years ago
Hello @giobus75,
these two lines leak memory:
output = getOutput(out);
cout << output->select({ to_string(0) }) << endl;
Tensor::select
and getOutput
functions return a new Tensor*
which should be destroyed.
Try to change them to:
output = getOutput(out);
Tensor* select_tensor = output->select({ to_string(0) });
cout << select_tensor << endl;
delete output;
delete select_tensor;
Hi @MicheleCancilla,
I tried your hint (here the code) but the problem remains. The memory increases also if I leave only the forward
function call.
I also tried to replace the forward API version with the low-level version: (net->forward( { x } );
) but the memory keeps increasing.
I don't know if this could be useful but I'm observing also this behavior: I'm using a dataset of 9984 images (256x256x3) and a batch size of 32. At the beginning of each epoch, memory starts to increase until about the 200th batch then stops increasing and restarts with the next epoch and so on.
Hi,
then if you run:
train_batch(net, { x }, { y }, indices);
instead of:
forward(net, { x });
is then ok??
I have tested this example that use forward function:
https://github.com/deephealthproject/eddl/blob/master/examples/nn/1_mnist/9_mnist_mlp_func.cpp
and effectively memory grows up! i will check and repair asap
Hi,
then if you run:
train_batch(net, { x }, { y }, indices);
instead of:
forward(net, { x });
is then ok??
Yes, it is
I have tested this example that use forward function:
https://github.com/deephealthproject/eddl/blob/master/examples/nn/1_mnist/9_mnist_mlp_func.cpp
and effectively memory grows up! i will check and repair asap
Thanks a lot, @RParedesPalacios !
Hi, i found it, is fixed in develop branch, please check
Hi, i found it, is fixed in develop branch, please check
Hi @RParedesPalacios, no more memory growth. Thank you again.
Just mention that I have experienced this memory leak in Multiple Sclerosis Segmentation training with EDDL 0.7.1, pyEDDL 0.9.0, ECVL 0.2.3, pyECVL 0.5.1 (50 epochs)
Good to check it has been solved in EDDL 0.8a
Hi, I'm trying to train a customized version of a VGG16 with a huge dataset with the python version of my code and the process was killed due to an oom. To understand if the problem was related to python bindings I wrote a piece of code in C++ keeping as simple as possible the implementation to replicate the issue. The code is here. In the loop across batches, if I comment rows 137-139 and leave uncommented the 136 one (train_batch) everything works fine. On the contrary, if I comment train_batch row and keep lines 137-139 uncommented, as I do during the evaluation of a validation set, memory occupation keeps increasing.