Closed enpasos closed 1 year ago
Is it the same issue as https://github.com/deepjavalibrary/djl/issues/2210? Is it solved after applying the patch https://github.com/deepjavalibrary/djl/pull/2232?
Is it solved after applying the patch #2232?
To check I took your latest https://github.com/deepjavalibrary/djl/pull/2232
git clone https://github.com/KexinFeng/djl.git
cd djl
gradlew build -x test
gradlew publishToMavenLocal
and ran the code to reproduce the bug against
git clone https://github.com/enpasos/reproducebug4.git
cd reproducebug4
git checkout localizing_memory_leak
gradlew build
java -jar app/build/libs/app-0.0.1-SNAPSHOT.jar
but I still see the same bug behaviour
[main] INFO com.enpasos.bugs.Main - ###################################################
[main] INFO com.enpasos.bugs.Main - memory leak of about 503 Bytes/epoch/batch
[main] INFO com.enpasos.bugs.Main - ###################################################
Is it the same issue as #2210?
It is the same field of problem. The impact of the bug behaviour from https://github.com/deepjavalibrary/djl/issues/2210? reproduced by
git clone https://github.com/enpasos/reproducebug2.git
cd reproducebug2
gradlew build
java -jar app/build/libs/app-0.0.1-SNAPSHOT.jar
is eliminated by the proposed solution https://github.com/deepjavalibrary/djl/pull/2273. (I am not using the word solved here as the suggested solution cleans the garbage, but in the ideal solution there would not be garbage).
However, the behaviour reported here even shows after applying the patch https://github.com/deepjavalibrary/djl/pull/2273.
@enpasos I think I find the possible root cause. Basically, the FashionMnist
extends ArrayDataset
. The iteration of this data set utilizes the new advanced indexing feature to achieve efficiency optimization, which is introduced in https://github.com/deepjavalibrary/djl/pull/1869.
The advanced indexing has memory leak issue, which is now fixed in https://github.com/deepjavalibrary/djl/pull/2300. So this is the possible root cause. You can apply this patch, then the memory leak issue is expected to be fixed.
Concrats for eliminating the root cause for this memory leak! Very nice :-) I ran the test case and no more memory leak here.
Description
On running the FashionMnist example from DJL Docs I experience a GPU memory leak of about 503 Bytes on each dataset iteration. illustrates the memory grows on GPU per epoch.
I see this increase even if the batch iteration is reduced to just the iteration without doing something else. I experience this loss without and with the suggested fix https://github.com/deepjavalibrary/djl/pull/2273 to clean up orphaned NDArrays.
Expected Behavior
No memory leak.
How to Reproduce?
I set up a toy app based on djl fashion mnist to reproduce the problem I experience:
To further localize the cause:
What have you tried to solve it?
Looking for the cause. Did not find it yet.
Environment Info