Memory saturation on jetson nano

anasBahou commented 5 years ago

@AlexeyAB I tested darknet with yolov3-tiny on my Jetson Nano and using a raspberry camera : the memory usage kept increasing until it crashed in 15min. My makefile :

GPU=1 CUDNN=1 CUDNN_HALF=0 OPENCV=1 AVX=0 OPENMP=0 LIBSO=0 ZED_CAMERA=0

DEBUG=0

ARCH= -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=[sm_50,compute_50] \ -gencode arch=compute_52,code=[sm_52,compute_52] \ -gencode arch=compute_61,code=[sm_61,compute_61]

OS := $(shell uname)

ARCH= -gencode arch=compute_53,code=[sm_53,compute_53]

I used this command to test: ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights "nvarguscamerasrc ! video/x-raw(memory:NVMM),width=1280, height=720, framerate=30/1, format=NV12 ! nvvidconv ! video/x-raw, format=BGRx, width=640, height=360 ! videoconvert ! video/x-raw, format=BGR ! appsink" -ext_output -don_show -json_port 8070 -mjpeg_port 8090

It also crashes when tested on .mp4 file (but over a longer period of time).

Any idea why ??

AlexeyAB commented 5 years ago

Will it be crashed without -json_port 8070 -mjpeg_port 8090 for example by using such command?

./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights "nvarguscamerasrc ! video/x-raw(memory:NVMM),width=1280, height=720, framerate=30/1, format=NV12 ! nvvidconv ! video/x-raw, format=BGRx, width=640, height=360 ! videoconvert ! video/x-raw, format=BGR ! appsink" -ext_output -don_show

Will it be crashed with web-camera or mp4 file? ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights test.mp4 -ext_output -don_show

anasBahou commented 5 years ago

@AlexeyAB Sorry for the late reply. It crashed for the first (without -json_port 8070 -mjpeg_port 8090 ) because of memory saturation. However, when i tested with mp4 file, the memory usage was increasing slowly but didn't reach saturation when suddenly it crashed (I noticed that the card heated up, I'll probably need a fan to cool it down). Do you have any clue why the memory usage increases over time ??

AlexeyAB commented 5 years ago

I don't know. I didn't meet this issue.

anasBahou commented 5 years ago

I don't know. I didn't meet this issue.

To your knowledge, is there any tools to debug this program or at least get an insight on how the memory is used when running the test ?

AlexeyAB commented 5 years ago

On Windows in MSVS you can use Visual Leak Detector
On Linux: use http://valgrind.org/ valgrind --tool=memcheck <your_app> <your_apps_params>

for example: valgrind --tool=memcheck ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights "nvarguscamerasrc ! video/x-raw(memory:NVMM),width=1280, height=720, framerate=30/1, format=NV12 ! nvvidconv ! video/x-raw, format=BGRx, width=640, height=360 ! videoconvert ! video/x-raw, format=BGR ! appsink" -ext_output -don_show

Also other softwares: ccmalloc, NJAMD, mpatrol, YAMD, LeakTracer, Purify ... https://www.networkworld.com/article/3006625/review-5-memory-debuggers-for-linux-coding.html?page=2

Or memleax or libleak https://unix.stackexchange.com/a/282944/66840

nkucza commented 5 years ago

Hey, i have the same issue on my Jetson Xavier. The memory usage keeps slowly increasing.

I checked with valgrind, the results are atteched below.

The command was:

valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all ./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights test.mp4 -ext_output -dont_show

==26898== Warning: noted but unhandled ioctl 0x4e04 with no size/direction hints. ==26898== This could cause spurious value errors to appear. ==26898== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==26898== ==26898== HEAP SUMMARY: ==26898== in use at exit: 722,072,635 bytes in 425,881 blocks ==26898== total heap usage: 972,101 allocs, 546,220 frees, 4,255,748,076 bytes allocated ==26898== ==26898== 64 bytes in 4 blocks are possibly lost in loss record 1 of 15 ==26898== at 0x4844B3C: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 88 bytes in 1 blocks are indirectly lost in loss record 2 of 15 ==26898== at 0x484522C: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 184 bytes in 1 blocks are possibly lost in loss record 3 of 15 ==26898== at 0x4846D10: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 632 bytes in 158 blocks are possibly lost in loss record 4 of 15 ==26898== at 0x4844BFC: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 6,080 bytes in 19 blocks are indirectly lost in loss record 5 of 15 ==26898== at 0x4846B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 6,992 (912 direct, 6,080 indirect) bytes in 1 blocks are definitely lost in loss record 6 of 15 ==26898== at 0x4846B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 19,500 (19,412 direct, 88 indirect) bytes in 228 blocks are definitely lost in loss record 7 of 15 ==26898== at 0x484522C: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 58,909 bytes in 9 blocks are still reachable in loss record 8 of 15 ==26898== at 0x4846D10: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 158,758 bytes in 1,067 blocks are still reachable in loss record 9 of 15 ==26898== at 0x4844B3C: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 178,632 bytes in 2,579 blocks are still reachable in loss record 10 of 15 ==26898== at 0x484522C: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 633,232 bytes in 4,622 blocks are possibly lost in loss record 11 of 15 ==26898== at 0x4846B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 2,764,800 bytes in 1 blocks are possibly lost in loss record 12 of 15 ==26898== at 0x4846E5C: memalign (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 39,083,421 bytes in 1,392 blocks are still reachable in loss record 13 of 15 ==26898== at 0x4846E5C: memalign (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 98,541,588 bytes in 266,156 blocks are still reachable in loss record 14 of 15 ==26898== at 0x4846B0C: calloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== 580,625,923 bytes in 149,643 blocks are still reachable in loss record 15 of 15 ==26898== at 0x4844BFC: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so) ==26898== ==26898== LEAK SUMMARY: ==26898== definitely lost: 20,324 bytes in 229 blocks ==26898== indirectly lost: 6,168 bytes in 20 blocks ==26898== possibly lost: 3,398,912 bytes in 4,786 blocks ==26898== still reachable: 718,647,231 bytes in 420,846 blocks ==26898== of which reachable via heuristic: ==26898== newarray : 1,536 bytes in 16 blocks ==26898== suppressed: 0 bytes in 0 blocks ==26898== ==26898== For counts of detected and suppressed errors, rerun with: -v ==26898== Use --track-origins=yes to see where uninitialised values come from ==26898== ERROR SUMMARY: 15 errors from 8 contexts (suppressed: 0 from 0)

anasBahou commented 5 years ago

Hi @nkucza.

The results of valgrind are just false positive. I have the exact same results. I've tried several other tools for memory check and got nothing. I'm out of ammo on this one. Let me know if you figure out how to fix it or any helpful insights.

Thanks.

nkucza commented 5 years ago

I went through the code and ended at the cudaMemcpy from network_predict(net,x). Something seems wrong there but so far i have no clue what :/

It's also strange that its totally fine on a rtx 2070 or gtx 1080.

nkucza commented 5 years ago

so, I'm a little smarter now.

The problem only occurs when using cuda within a pthread. If you remove the threads in the darknet demo program and execute everything in the main, the error is gone as far as I can judge.

I have opened a topic at the nvidia dev forum: https://devtalk.nvidia.com/default/topic/1061071/cuda-programming-and-performance/memory-leak-using-pthread-and-cuda-on-jetson-xavier/

nkucza commented 5 years ago

I just noticed that the error still exists when the fetch thread is built in. So even a thread parallel seems to be bad. But threads or cuda by itself seems to be okay.

I will give up for now, maybe somebody else knows something to do with it.

AlexeyAB commented 5 years ago

@anasBahou @nkucza So something wrong with Pthread + CUDA? Try to describe this problem there: https://devtalk.nvidia.com/default/board/371/jetson-nano/

nkucza commented 5 years ago

@AlexeyAB I'd guess it's just pthread. The problem persists if only the fetch thread is built in. Maybe the image forwarding change be to pointer, so that not so much is copied on the heap anymore.

AlexeyAB commented 5 years ago

Maybe the image forwarding change be to pointer, so that not so much is copied on the heap anymore.

What do you mean?

hanseahn commented 5 years ago

Hello. I have same issue with Jetson Nano. I have tried to fix memory leak but can't figure out the problem. But, I find out that original darknet (https://github.com/pjreddie/darknet) does not cause memory leak on Jetson Nano.

but the problem of original darknet is weights file from AlexyAB-darknet does not work well. Is there a way to use weights file from AlexyAB-darknet to original darknet?

aimhabo commented 5 years ago

Same issue on GTX 1070 Ti and RTX 2080. I checked all the pointers released. Now only the pthread and cuda parts are not very understandable.

Can pthread_t be avoid used to read data?

AlexeyAB commented 5 years ago

@aimhabo

Try to add this line LIB_API void *load_thread(void *ptr); there: https://github.com/AlexeyAB/darknet/blob/5bbbbd7c53d39f9369c8b53c369389fa40c84c50/include/darknet.h#L876

And use:

load_thread(args); instead of https://github.com/AlexeyAB/darknet/blob/5bbbbd7c53d39f9369c8b53c369389fa40c84c50/src/detector.c#L161
load_thread(args); instead of https://github.com/AlexeyAB/darknet/blob/5bbbbd7c53d39f9369c8b53c369389fa40c84c50/src/detector.c#L188
load_thread(args); instead of https://github.com/AlexeyAB/darknet/blob/5bbbbd7c53d39f9369c8b53c369389fa40c84c50/src/detector.c#L203
load_thread(args); instead of https://github.com/AlexeyAB/darknet/blob/5bbbbd7c53d39f9369c8b53c369389fa40c84c50/src/detector.c#L267

and remove these lines:

aimhabo commented 5 years ago

@AlexeyAB The parameter of load_thread() is a pointer. So I make a pointer struct load_args* args_p = (load_args*)calloc(1, sizeof(struct load_args)); *args_p=args; like what load_data() do. After those, the Segmentation fault (core dumped) error was generated directly at the line of first load_thread. It also happens on load_thread(&args); directly.

"SFCD" can be fixed by replacing every

                pthread_join(load_thread, 0);
                free_data(train);
                train = buffer;
                load_thread = load_data(args);

by

                free_data(train);
                train = buffer;
                args_p = (load_args*)calloc(1, sizeof(struct load_args));
                *args_p = args;
                load_thread(args_p);///avoid pthread

, but still memory leak in every "Syncing..." and "Resizing".

AlexeyAB commented 5 years ago

but still memory leak in every "Syncing..." and "Resizing".

Do you mean that if you use only 1 GPU (without Syncing) and set random=0 (without Resizing) then there is no memory leak?

aimhabo commented 5 years ago

I remake darknet without opencv, and... After "random=0", it still rose rapidly. When using a single GPU for training, the memory rises not so fast, but its rise still can't stop. In case of both "random=0" and training on single GPU, there is still a memory leak, but the leak rate is much slower. Presented as a very slow-rising waveform.

AlexeyAB commented 5 years ago

Does this fix solve memory leak issue? https://github.com/AlexeyAB/darknet/issues/3633#issuecomment-553714045

aimhabo commented 5 years ago

no, it just slows the speed of both loading and memory leaking

AlexeyAB commented 5 years ago

@aimhabo Can you check (f.e. by using Valgrind) that there is memory leak rather than a controlled increase in memory usage by libraries or the operating system?

valgrind --leak-check=full \
         --show-leak-kinds=all \
         --track-origins=yes \
         --verbose \
         --log-file=valgrind-out.txt \
         ./darknet detector ...

aimhabo commented 5 years ago

@AlexeyAB Now iterations 150 times in 6 hours. And memory usage increases by 50MB. There're 8 kinds of valgrind information as below.

==22252== Source and destination overlap in strcpy(0x9d89a790, 0x9d89a790) ==22252== at 0x4C310E6: strcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==22252== by 0x40C19A: find_replace_extension (in /home/wit/wjx/darknet/darknet) ==22252== by 0x40C34D: replace_image_to_label (in /home/wit/wjx/darknet/darknet) ==22252== by 0x44E98B: fill_truth_detection (in /home/wit/wjx/darknet/darknet) ==22252== by 0x451371: load_data_detection (in /home/wit/wjx/darknet/darknet) ==22252== by 0x452C4F: load_thread (in /home/wit/wjx/darknet/darknet) ==22252== by 0x206E76B9: start_thread (pthread_create.c:333) ==22252== by 0x20A0441C: clone (clone.S:109)

==23733== 4 bytes in 1 blocks are still reachable in loss record 1 of 22,149 ==23733== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==23733== by 0x4C2FDEF: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==23733== by 0x50067BF: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5006A7A: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5006D25: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5006F21: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5034866: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5002866: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5055B9C: cuInit (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5CD48A9: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CD4900: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x206EEA98: __pthread_once_slow (pthread_once.c:116)

==23733== 8 bytes in 1 blocks are still reachable in loss record 3 of 22,149 ==23733== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==23733== by 0xF01F447: ??? (in /usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1) ==23733== by 0xF002887: ??? (in /usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1) ==23733== by 0xF00127F: ??? (in /usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1) ==23733== by 0xF058255: ??? (in /usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1) ==23733== by 0xDB90D02: ??? (in /usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1) ==23733== by 0xFFEFFFBC7: ??? ==23733== by 0x4010689: call_init.part.0 (dl-init.c:58) ==23733== by 0x40107DA: call_init (dl-init.c:30) ==23733== by 0x40107DA: _dl_init (dl-init.c:120) ==23733== by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so) ==23733== by 0x6: ??? ==23733== by 0xFFEFFFFE6: ???

==23733== 8 bytes in 1 blocks are still reachable in loss record 362 of 22,149 ==23733== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==23733== by 0x664C777: ??? (in /usr/local/cuda-9.0/lib64/libcublas.so.9.0.480) ==23733== by 0x662FBB7: ??? (in /usr/local/cuda-9.0/lib64/libcublas.so.9.0.480) ==23733== by 0x61B443F: ??? (in /usr/local/cuda-9.0/lib64/libcublas.so.9.0.480) ==23733== by 0x6685585: ??? (in /usr/local/cuda-9.0/lib64/libcublas.so.9.0.480) ==23733== by 0x5F651B2: ??? (in /usr/local/cuda-9.0/lib64/libcublas.so.9.0.480) ==23733== by 0xFFEFFFBC7: ??? ==23733== by 0x4010689: call_init.part.0 (dl-init.c:58) ==23733== by 0x40107DA: call_init (dl-init.c:30) ==23733== by 0x40107DA: _dl_init (dl-init.c:120) ==23733== by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so) ==23733== by 0x6: ??? ==23733== by 0xFFEFFFFE6: ???

==23733== 8 bytes in 1 blocks are still reachable in loss record 502 of 22,149 ==23733== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==23733== by 0x5CD5E12: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CC3277: cudaRegisterFatBinary (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x405C6C: sti__cudaRegisterAll() (in /home/wit/wjx/darknet/darknet) ==23733== by 0x4DB20C: libc_csu_init (in /home/wit/wjx/darknet/darknet) ==23733== by 0x2091D7BE: (below main) (libc-start.c:247)

==23733== 8 bytes in 1 blocks are still reachable in loss record 503 of 22,149 ==23733== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==23733== by 0x5021544: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5002A8A: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5055B9C: cuInit (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5CD48A9: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CD4900: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x206EEA98: __pthread_once_slow (pthread_once.c:116) ==23733== by 0x5D0C868: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CD0B69: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CD5D8A: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CFD576: cudaSetDevice (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x4107D1: cuda_set_device (in /home/wit/wjx/darknet/darknet)

==23733== 16 bytes in 1 blocks are still reachable in loss record 504 of 22,149 ==23733== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==23733== by 0x5147832: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x50EAB09: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5015A17: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5002AF0: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5055B9C: cuInit (in /usr/lib/x86_64-linux-gnu/libcuda.so.384.130) ==23733== by 0x5CD48A9: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CD4900: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x206EEA98: __pthread_once_slow (pthread_once.c:116) ==23733== by 0x5D0C868: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CD0B69: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176) ==23733== by 0x5CD5D8A: ??? (in /usr/local/cuda-9.0/lib64/libcudart.so.9.0.176)

==23733== 32 bytes in 1 blocks are still reachable in loss record 507 of 22,149 ==23733== at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==23733== by 0x20CC8626: _dlerror_run (dlerror.c:141) ==23733== by 0x20CC7FA0: dlopen@@GLIBC_2.2.5 (dlopen.c:87) ==23733== by 0xF0555F0: ??? (in /usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1) ==23733== by 0xF058255: ??? (in /usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1) ==23733== by 0xDB90D02: ??? (in /usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1) ==23733== by 0xFFEFFFBC7: ??? ==23733== by 0x4010689: call_init.part.0 (dl-init.c:58) ==23733== by 0x40107DA: call_init (dl-init.c:30) ==23733== by 0x40107DA: _dl_init (dl-init.c:120) ==23733== by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so) ==23733== by 0x6: ??? ==23733== by 0xFFEFFFFE6: ???

and in the final (I stop it)

==23733== LEAK SUMMARY: ==23733== definitely lost: 0 bytes in 0 blocks ==23733== indirectly lost: 0 bytes in 0 blocks ==23733== possibly lost: 2,120 bytes in 16 blocks ==23733== still reachable: 2,287,694 bytes in 22,147 blocks ==23733== of which reachable via heuristic: ==23733== stdstring : 25,194 bytes in 294 blocks ==23733== suppressed: 0 bytes in 0 blocks ==23733== ==23733== ERROR SUMMARY: 16 errors from 16 contexts (suppressed: 0 from 0) ==23733== ERROR SUMMARY: 16 errors from 16 contexts (suppressed: 0 from 0)

AlexeyAB commented 5 years ago

@aimhabo Thanks!

So there is only possibly lost: 2,120 bytes in 6 hours

And memory usage increases by 50MB.

It doesn't mean that there is memory leak. It means that Darknet or C/C++ std-library or OpenCV, cuDNN, ... don't allocate required memory at the start and just do it later.

==23733== LEAK SUMMARY: ==23733== definitely lost: 0 bytes in 0 blocks ==23733== indirectly lost: 0 bytes in 0 blocks ==23733== possibly lost: 2,120 bytes in 16 blocks ==23733== still reachable: 2,287,694 bytes in 22,147 blocks

So there is no memory leak. still reachable http://valgrind.org/docs/manual/faq.html#faq.reports definitely lost http://valgrind.org/docs/manual/faq.html#faq.deflost

I will look at possibly lost and Source and destination overlap in strcpy(0x9d89a790, 0x9d89a790) in find_replace_extension

Now iterations 150 times in 6 hours. And memory usage increases by 50MB. There're 8 kinds of valgrind information as below.

150 training iterations in 6 hours? How much memory usage increases after 1,2,3 and 6 hours?

In my cases memory usage increases for a first 1 hour and then the increase in memory usage stops. It never leads to the depletion of the entire program memory and crash.

AlexeyAB commented 5 years ago

==22252== Source and destination overlap in strcpy(0x9d89a790, 0x9d89a790) - I fixed it, but I don't think that this is a big problem: https://github.com/AlexeyAB/darknet/commit/71e835458904f782a905a06d28b4558d9e9830b4
==23733== possibly lost: 2,120 bytes in 16 blocks - I don't see detailed messages where it found possibly lost (as for still reachable)
==23733== still reachable: 2,287,694 bytes in 22,147 blocks - isn't a memory leak, also all these message relates to CUDA-libraries, I think there is no memory leak inside nVidia CUDA

aimhabo commented 5 years ago

@AlexeyAB It's amazing fiexd! Both single GTX1070 and GTX2080. But still awful increasing memory in case of 2xGTX2080…

AlexeyAB / darknet

Memory saturation on jetson nano #3633