Closed simleo closed 3 years ago
Ok i will check
Solved in develop branch
Solved in develop branch
Which commit?
@RParedesPalacios Was it fixed in https://github.com/deephealthproject/eddl/commit/19a50a612401313a43ba7255b1051e2aaacb251c or does one need to update to https://github.com/deephealthproject/eddl/commit/494630f04aac2cb75e0e947b4530b3630ee68172?
The last
CC: @mdrio @giobus75
Short summary: calling
predict
with tensors of increasing size leads to memory errors even though predicting with the largest size works, suggesting an unreleased memory problem.The following program results in increasing memory occupation that leads to an out of memory error:
Output:
As shown above, the out of memory error happens at bs = 12. However, the problem is not that the prediction with size 12 does not fit. If the code is changed to keep the bs constant at size 14, the memory error does not happen.
Change:
Output:
This suggets that, with varying bs, memory from the smaller allocations is not released, so that when an allocation with size 12 is tried, part of the memory is already occupied and it fails, while this does not happen with a constant size.
I've also tried replicating this on the CPU, where the behavior is consistent with the above. For instance, with 6 iterations, if the block size is kept constant at 6, the final memory footprint of the process is 4 GB (and it does not change with subsequent iterations); if, on the other hand, the block size increases from 1 to 6, the final memory footprint is 6 GB (increasing from the first iteration to the last).
The Python version of the program also behaves in the same way. The only difference is the absence, for some reason, of the "Aborted (core dumped)" message in the output.
Note: the weights file used while debugging can be downloaded from https://cloud.crs4.it/s/QbHtxSGACCetsew