LdDl / go-darknet

Go bindings for Darknet (YOLO v4 / v7-tiny / v3)
Apache License 2.0
82 stars 19 forks source link

[BUG] memory leak on perform_network_detect C function #15

Closed arthurhenrique closed 3 years ago

arthurhenrique commented 3 years ago

Describe the bug Memory doesn't free at the perform_network_detect function. Precisely at the network_predict and get_network_boxes calls.

To Reproduce

Software Versions:

go version 1.15.6
go-darknet v1.3.0

Model:

yolov4-tiny.cfg
yolov4_tiny.weights
https://github.com/AlexeyAB/darknet/archive/darknet_yolo_v4_pre.zip 

Snnipet:

imgDarknet, err := darknet.Image2Float32(src)
if err != nil {
    return nil, err
}

dr, err := Network.Detect(imgDarknet)
if err != nil {
     return nil, err
}
LdDl commented 3 years ago

We were used to call C network_predict_ptr(): https://github.com/LdDl/go-darknet/commit/420a74769d1d71b70eeda5cdedb3799e2d1825c7 After some updates by AlexeyAB we moved to network_predict(): https://github.com/LdDl/go-darknet/commit/68ad5efb358693f0e2928856d45972f921bf335b#diff-175ed579a72929bebd6fa59d958bab1e3b59994f9cd7a9f2cd99d3d41ca1c7e3L17

I'll take a look at this with valgrind and pprof

LdDl commented 3 years ago

running valgrind:

valgrind --log-file=val.log --track-origins=yes --leak-check=full ./main --configFile=../yolov4.cfg --weightsFile=../yolov4.weights --imageFile=../sample.jpg

output log file: val.log

==44520== LEAK SUMMARY:
==44520==    definitely lost: 23,608 bytes in 958 blocks
==44520==    indirectly lost: 530,944 bytes in 1,037 blocks
==44520==      possibly lost: 1,593,644 bytes in 12,199 blocks
==44520==    still reachable: 460,732,964 bytes in 467,721 blocks

Finding "definitely lost" in file gives me:

==44520== 41,600 (640 direct, 40,960 indirect) bytes in 1 blocks are definitely lost in loss record 28,132 of 28,237
==44520==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==44520==    by 0x4E72285: xcalloc (in /usr/lib/libdarknet.so)
==44520==    by 0x4E7C980: list_to_array (in /usr/lib/libdarknet.so)
==44520==    by 0x4EC0572: get_labels_custom (in /usr/lib/libdarknet.so)
==44520==    by 0x4EDD702: get_metadata (in /usr/lib/libdarknet.so)
==44520==    by 0x4ECCBC: _cgo_2c12f21c68d2_Cfunc_get_metadata (cgo-gcc-prolog:103)
==44520==    by 0x45E54F: runtime.asmcgocall (/usr/local/go/src/runtime/asm_amd64.s:655)
==44520==    by 0x1FFEFFFBCF: ???
==44520==    by 0x45C0D0: runtime.exitsyscallfast.func1 (/usr/local/go/src/runtime/proc.go:3175)
==44520==    by 0x45CD75: runtime.systemstack (/usr/local/go/src/runtime/asm_amd64.s:370)
==44520==    by 0x436F7F: ??? (<autogenerated>:1)
==44520==    by 0x45CC03: runtime.rt0_go (/usr/local/go/src/runtime/asm_amd64.s:220)

....

==44520== 512,952 (22,968 direct, 489,984 indirect) bytes in 957 blocks are definitely lost in loss record 28,214 of 28,237
==44520==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==44520==    by 0x4E72225: xmalloc (in /usr/lib/libdarknet.so)
==44520==    by 0x4EDD3A6: option_insert (in /usr/lib/libdarknet.so)
==44520==    by 0x4EDD428: read_option (in /usr/lib/libdarknet.so)
==44520==    by 0x4EDD4CB: read_data_cfg (in /usr/lib/libdarknet.so)
==44520==    by 0x4EDD6DE: get_metadata (in /usr/lib/libdarknet.so)
==44520==    by 0x4ECCBC: _cgo_2c12f21c68d2_Cfunc_get_metadata (cgo-gcc-prolog:103)
==44520==    by 0x45E54F: runtime.asmcgocall (/usr/local/go/src/runtime/asm_amd64.s:655)
==44520==    by 0x1FFEFFFBCF: ???
==44520==    by 0x45C0D0: runtime.exitsyscallfast.func1 (/usr/local/go/src/runtime/proc.go:3175)
==44520==    by 0x45CD75: runtime.systemstack (/usr/local/go/src/runtime/asm_amd64.s:370)
==44520==    by 0x436F7F: ??? (<autogenerated>:1)

https://github.com/LdDl/go-darknet/tree/issue-15 - branch for this issue. Currently I've updated AlexeyAB's Darknet version + switched network_predict() to network_predict_ptr()

Is this really bindings mistake? What do you think? @arthurhenrique

LdDl commented 3 years ago

https://github.com/LdDl/go-darknet/commit/165e59aaf39adb42a12b19bb2b7612224856bd6c#diff-6f0cc157b676e67fdf8cc73ae7890fc04361c23e9552d345f6df5b8c23e496f4L64

btw, calling "defer" was not a great idea in base example

arthurhenrique commented 3 years ago

Is this really bindings mistake? What do you think? @arthurhenrique

I think it is an internal problem of the darknet library. I made a binding to free_network_ptr, but the memory problem still remains...

arthurhenrique commented 3 years ago

btw, calling "defer" was not a gread idea in base example

I remove the defer too, thank you! Just for curiosity. Why is this a not great idea?

LdDl commented 3 years ago

btw, calling "defer" was not a gread idea in base example

I remove the defer too, thank you! Just for curiosity. Why is this a not great idea?

@arthurhenrique Because you may want to write something like this:

for t := 0; t < 1000; t++{
    imgDarknet, err := darknet.Image2Float32(src)
    if err != nil {
        panic(err.Error())
    }
    defer imgDarknet.Close() // <---- here is a bad code
    //
    // Some code
    //
    time.Sleep(100*time.Millisecond)
}

... and this will eat yours RAM pretty fast.

"defer" calls not working in loops. If you want to user "defer", you should wrap it with some function. Here is example where we used defer call and it's correct since it wrapped with func(w http.ResponseWriter, req *http.Request). The "defer" calls lives on stack and then executed in a oposite order (reverse) at the end of wrapping function.

So right way will be:

for t := 0; t < 1000; t++{
    imgDarknet, err := darknet.Image2Float32(src)
    if err != nil {
        panic(err.Error())
    }
    //
    // Some code
    //
        imgDarknet.Close() // <---- Free image when you don't need it anymore
         //
    // Some another code
    //
    time.Sleep(100*time.Millisecond)
}
arthurhenrique commented 3 years ago

Oh, sorry! imgDarknet.Close() It's already wrapped. 👍