PSAL-POSTECH / ONNXim

ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
MIT License
45 stars 10 forks source link

memory leakage check at the end of code running #12

Closed linbaiwpi closed 2 weeks ago

linbaiwpi commented 2 weeks ago

Hi, I build the source from scratch and then run the example command, but got more than 1000 lines memory leaking info. Do you have a sense what leads to this?

when I have done:

  1. build the docker image using docker build . -t onnxim
  2. follow section "2 Manual Method"
  3. run example ./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --model ./example/models_list.json

I actually got the model running completely (the first 2 lines of the print out listed below). But it seems in the end a memory leakage check has been performed.

[2024-09-04 16:50:31.530] [info] Simulation time: 39.460704 seconds
[2024-09-04 16:50:31.530] [info] Total tile: 31, simulated tile per seconds(TPS): 1.152546

=================================================================
==1982==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 9375296 byte(s) in 292978 object(s) allocated from:
    #0 0x7f4fc3d45b57 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x558475cbc3ba in DramRamulator2::push(unsigned int, MemoryAccess*) /workspace/ONNXim/src/Dram.cc:171
    #2 0x558475d3ec8a in Simulator::cycle() /workspace/ONNXim/src/Simulator.cc:172
    #3 0x558475d3cf67 in Simulator::run_simulator() /workspace/ONNXim/src/Simulator.cc:83
    #4 0x558475d9a919 in main /workspace/ONNXim/src/main.cc:122
    #5 0x7f4fc2830082 in __libc_start_main ../csu/libc-start.c:308

Indirect leak of 1851392 byte(s) in 3616 object(s) allocated from:
    #0 0x7f4fc3d45b57 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x5584760e9581 in __gnu_cxx::new_allocator<long>::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:121
    #2 0x5584760e9581 in std::allocator<long>::allocate(unsigned long) /usr/include/c++/10/bits/allocator.h:181
    #3 0x5584760e9581 in std::allocator_traits<std::allocator<long> >::allocate(std::allocator<long>&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460
    #4 0x5584760e9581 in std::_Deque_base<long, std::allocator<long> >::_M_allocate_node() /usr/include/c++/10/bits/stl_deque.h:559
    #5 0x5584760e9581 in std::_Deque_base<long, std::allocator<long> >::_M_create_nodes(long**, long**) /usr/include/c++/10/bits/stl_deque.h:660
    #6 0x5584760e9581 in std::_Deque_base<long, std::allocator<long> >::_M_initialize_map(unsigned long) /usr/include/c++/10/bits/stl_deque.h:634

Indirect leak of 1327104 byte(s) in 2592 object(s) allocated from:
    #0 0x7f4fc3d45b57 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x558475fbeb74 in __gnu_cxx::new_allocator<long>::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:121
    #2 0x558475fbeb74 in std::allocator<long>::allocate(unsigned long) /usr/include/c++/10/bits/allocator.h:181
    #3 0x558475fbeb74 in std::allocator_traits<std::allocator<long> >::allocate(std::allocator<long>&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460
    #4 0x558475fbeb74 in std::_Deque_base<long, std::allocator<long> >::_M_allocate_node() /usr/include/c++/10/bits/stl_deque.h:559
    #5 0x558475fbeb74 in std::deque<long, std::allocator<long> >::_M_new_elements_at_front(unsigned long) /usr/include/c++/10/bits/deque.tcc:891
HamHyungkyu commented 2 weeks ago

Thank you for your comment. I addressed the direct memory leakage of the mem_fetch object in DramRamulator2. When you build ONNXim in debug mode, it checks for memory sanitization. This slows down the simulation, so if you need faster simulation, please build ONNXim in Release mode.

linbaiwpi commented 2 weeks ago

Thank you for your comment. I addressed the direct memory leakage of the mem_fetch object in DramRamulator2. When you build ONNXim in debug mode, it checks for memory sanitization. This slows down the simulation, so if you need faster simulation, please build ONNXim in Release mode.

Thanks for your quick reply.

1) if I understand correctly, this memory sanitization won't affect the correctness of example run of ONNXim?

2) also, I tried to compile it in Release mode and interestingly, I got the following new fault.

root@0c4c6e0e9971:/workspace/ONNXim/build# ./bin/Simulator --config /workspace/ONNXim/configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --model /workspace/ONNXim/example/models_list.json [2024-09-04 20:21:53.259] [info] CPU 0: Partition 0 [2024-09-04 20:21:53.259] [info] CPU 1: Partition 0 [2024-09-04 20:21:53.259] [info] CPU 2: Partition 0 [2024-09-04 20:21:53.259] [info] CPU 3: Partition 0 [2024-09-04 20:21:53.259] [info] Running in default mode [2024-09-04 20:21:53.260] [info] Simulator Configuration: [2024-09-04 20:21:53.260] [info] [Core 0] Systolic Array Throughput: 131072 GFLOPS, Spad size: 32768 KB, Accumulator size: 4096 KB [2024-09-04 20:21:53.260] [info] [Core 1] Systolic Array Throughput: 131072 GFLOPS, Spad size: 32768 KB, Accumulator size: 4096 KB [2024-09-04 20:21:53.260] [info] [Core 2] Systolic Array Throughput: 131072 GFLOPS, Spad size: 32768 KB, Accumulator size: 4096 KB [2024-09-04 20:21:53.260] [info] [Core 3] Systolic Array Throughput: 131072 GFLOPS, Spad size: 32768 KB, Accumulator size: 4096 KB [2024-09-04 20:21:53.260] [info] DRAM Bandwidth 614 GB/s [2024-09-04 20:21:53.260] [info] Ramulator2 config: /workspace/ONNXim/configs/../configs/ramulator2_configs/HBM2.yaml [2024-09-04 20:21:53.267] [info] Initialize SimpleInterconnect [2024-09-04 20:21:53.267] [info] No mapping file path : /workspace/ONNXim/models/resnet18/resnet18.mapping [2024-09-04 20:21:53.267] [info] Register model: resnet18 [2024-09-04 20:21:53.267] [info] Model Name [2024-09-04 20:21:53.298] [info] ======Start Simulation===== Floating point exception (core dumped)

YWHyuk commented 2 weeks ago

Hi @linbaiwpi , thanks for your issue report

if I understand correctly, this memory sanitization won't affect the correctness of example run of ONNXim?

No, it will not affect the correctness.

also, I tried to compile it in Release mode and interestingly, I got the following new fault

Can you make sure your repository is currently up to date with the latest master version?

If you're still getting the error, I'd appreciate it if you could attach the systolic_ws_128x128_c4_simple_noc_tpuv4.json and models_list.json you're using so I can reproduce it.

linbaiwpi commented 2 weeks ago

Hi @linbaiwpi , thanks for your issue report

if I understand correctly, this memory sanitization won't affect the correctness of example run of ONNXim?

No, it will not affect the correctness.

also, I tried to compile it in Release mode and interestingly, I got the following new fault

Can you make sure your repository is currently up to date with the latest master version?

If you're still getting the error, I'd appreciate it if you could attach the systolic_ws_128x128_c4_simple_noc_tpuv4.json and models_list.json you're using so I can reproduce it.

I pull the newest commit [ac10777] and re-build using the following commands: $ mkdir build && cd build $ conan install .. $ cmake .. $ make -j8 And then run the Simulator $ ./build/bin/Simulator --config ./configs/systolic_ws_128x128_c4_simple_noc_tpuv4.json --model ./example/models_list.json Still I got the same segfault [2024-09-04 23:29:59.269] [info] CPU 0: Partition 0 [2024-09-04 23:29:59.269] [info] CPU 1: Partition 0 [2024-09-04 23:29:59.269] [info] CPU 2: Partition 0 [2024-09-04 23:29:59.269] [info] CPU 3: Partition 0 [2024-09-04 23:29:59.269] [info] Running in default mode [2024-09-04 23:29:59.269] [info] Simulator Configuration: [2024-09-04 23:29:59.269] [info] [Core 0] Systolic Array Throughput: 131072 GFLOPS, Spad size: 32768 KB, Accumulator size: 4096 KB [2024-09-04 23:29:59.269] [info] [Core 1] Systolic Array Throughput: 131072 GFLOPS, Spad size: 32768 KB, Accumulator size: 4096 KB [2024-09-04 23:29:59.269] [info] [Core 2] Systolic Array Throughput: 131072 GFLOPS, Spad size: 32768 KB, Accumulator size: 4096 KB [2024-09-04 23:29:59.269] [info] [Core 3] Systolic Array Throughput: 131072 GFLOPS, Spad size: 32768 KB, Accumulator size: 4096 KB [2024-09-04 23:29:59.269] [info] DRAM Bandwidth 614 GB/s [2024-09-04 23:29:59.269] [info] Ramulator2 config: /workspace/ONNXim/configs/../configs/ramulator2_configs/HBM2.yaml [2024-09-04 23:29:59.274] [info] Initialize SimpleInterconnect [2024-09-04 23:29:59.274] [info] No mapping file path : /workspace/ONNXim/models/resnet18/resnet18.mapping [2024-09-04 23:29:59.274] [info] Register model: resnet18 [2024-09-04 23:29:59.274] [info] Model Name [2024-09-04 23:29:59.295] [info] ======Start Simulation===== Floating point exception (core dumped)

YWHyuk commented 2 weeks ago

I pull the newest commit [[ac10777](https://github.com/PSAL-09-04 23:29:59.274] [info] Model Name[2024-09-04 23:29:59.295] [info] ======Start Simulation===== Floating point exception (core dumped)

Could you attach the your systolic_ws_128x128_c4_simple_noc_tpuv4.json and models_list.json, so I can reproduce it?

linbaiwpi commented 2 weeks ago

systolic_ws_128x128_c4_simple_noc_tpuv4.json

Actually the systolic_ws_128x128_c4_simple_noc_tpuv4.json and models_list.json are both from the latest commit in main branch. Nothing has been changed on my side. I also attached them to this post. Please check.

systolic_ws_128x128_c4_simple_noc_tpuv4.json models_list.json

YWHyuk commented 2 weeks ago

I tested the latest version in a docker environment and failed to reproduce your issue.

Can you tell us where in the source code the exception is occurring? If a Core dump file has been generated, you can check it out.

linbaiwpi commented 2 weeks ago

I tested the latest version in a docker environment and failed to reproduce your issue.

Can you tell us where in the source code the exception is occurring? If a Core dump file has been generated, you can check it out.

Hi, here I pasted the backtrace of float point exception.


(gdb) backtrace
#0  0x00005555555de1e1 in MappingTable::_calc_conv_mapping(bool, int, int, int, bool, bool, bool, int, int, int, int, int, int, int, int, int) [clone .constprop.1] ()
#1  0x00005555555e392d in MappingTable::calc_conv_mapping(Mapping::LoopCounts&) ()
#2  0x00005555555e45dc in MappingTable::conv_mapping(Mapping::LoopCounts&) ()
#3  0x00005555555e555b in MappingTable::fallback_mapping(Mapping::LoopCounts&) ()
#4  0x000055555565f1e8 in ConvWS::initialize_tiles(MappingTable&) ()
#5  0x00005555555ecd38 in Model::initialize_model(std::vector<std::unique_ptr<Tensor, std::default_delete<Tensor> >, std::allocator<std::unique_ptr<Tensor, std::default_delete<Tensor> > > >&) ()
#6  0x00005555555f4d8e in Simulator::handle_model() ()
#7  0x00005555555f5f97 in Simulator::cycle() ()
#8  0x00005555555b5db3 in main ()

I found the variable named _dim in class MappingTable was not initialized. What confused me is, this private member _dim is used in function MappingTable::_calc_conv_mapping but no initialization. In the meantime, variable with the same name _dim was defined in MappingTable::calc_conv_mapping and MappingTable::gemm_mapping. In these two functions, _dim means the core_height in hardware config, which is the dimension of hardware PE array.

Please correct me if my statement is wrong.

Best

YWHyuk commented 2 weeks ago

Thank you for taking the time to help us debug. As you said, _calc_conv_mapping was using an uninitialized member variable. I've pushed a commit that fixes that issue.

I really appreciate you finding the bug and letting me know!

linbaiwpi commented 2 weeks ago

Thank you for your active reply and this solved my issue. I will close this issue.