Samsung / ONE

On-device Neural Engine
Other
427 stars 151 forks source link

[circle-quantizer] Replace `TestIOGraph.h` with the `testhelper TestIOGraph.h` #6455

Closed dayo09 closed 3 years ago

dayo09 commented 3 years ago

Let's use compiler/luci/testhelper/include/luci/test/TestIOGraph.h instead of compiler/luci/pass/src/test/TestIOGraph.h.

As the module is moved and maintained as a testhelper, it's better to remove the staled one.

For #6367 Related to #6335

dayo09 commented 3 years ago

Segmentation Fault

For now, mysterious segmentation has occurred.

https://github.com/Samsung/ONE/pull/6335#issuecomment-814003591

@seanshpark Is this available? I have tried to use this instead of compiler/luci/pass/src/test/TestIOGraph.h, segfault occurred 😢

luci::test::TestIOGraph::output()->from(node) has caused the segfault.

(ADDED) I see that compiler/luci/partition/src/PartitionPModules.test.cpp and other tests run well... I will examine more. 🌵

``` /nfs/git/ONE  ➦ ee117711 ±  coredumpctl dump PID: 442 (luci_pass_test) UID: 1000 (dayo) GID: 1000 (dayo) Signal: 11 (SEGV) Timestamp: Tue 2021-04-06 19:07:48 KST (3min 15s ago) Command Line: ./build/compiler/luci/pass/luci_pass_test Executable: /nfs/git/ONE/build/compiler/luci/pass/luci_pass_test Control Group: /user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service Unit: user@1000.service User Unit: gnome-terminal-server.service Slice: user-1000.slice Owner UID: 1000 (dayo) Boot ID: 9e9bb3d3f5ac4dc693f383a9df9f97d8 Machine ID: af68a111f0924ae79fe2f70cfad85e43 Hostname: dayo-400T8A-400S8A Storage: /var/lib/systemd/coredump/core.luci_pass_test.1000.9e9bb3d3f5ac4dc693f383a9df9f97d8.442.1617703668000000.lz4 Message: Process 442 (luci_pass_test) of user 1000 dumped core. Stack trace of thread 442: #0 0x00007f97795b2f42 _ZNKSt4lessIPN4loco3UseEEclERKS2_S5_ (libloco.so) #1 0x00007f97795b2e0a _ZNSt8_Rb_treeIPN4loco3UseES2_St9_IdentityIS2_ESt4lessIS2_ESaIS2_EE14_M_lower_boundEPSt13_Rb_tree_nodeIS2_EPSt18_Rb_tree_node_baseRKS2_ (libloco.so) #2 0x00007f97795b2adc _ZNSt8_Rb_treeIPN4loco3UseES2_St9_IdentityIS2_ESt4lessIS2_ESaIS2_EE4findERKS2_ (libloco.so) #3 0x00007f97795b29a0 _ZNSt3setIPN4loco3UseESt4lessIS2_ESaIS2_EE4findERKS2_ (libloco.so) #4 0x00007f97795b2864 _ZN4loco3Use4nodeEPNS_4NodeE (libloco.so) #5 0x000055abba6d79a9 _ZN4luci12CircleOutput4fromEPN4loco4NodeE (luci_pass_test) #6 0x000055abba73aa9a init (luci_pass_test) #7 0x000055abba73b0ae _ZN40QuantizedModelVerifierTest_Logistic_Test8TestBodyEv (luci_pass_test) #8 0x000055abbb0c9b08 n/a (n/a) Refusing to dump core to tty (use shell redirection or specify --output). (py369) ✘ dayo  /nfs/git/ONE  ➦ ee117711 ±  ```

https://github.com/Samsung/ONE/pull/6335#issuecomment-814463920

I have tried to use this instead of compiler/luci/pass/src/test/TestIOGraph.h, segfault occurred

Old one may caused some confusion. I'll work on removing the old one :)

dayo09 commented 3 years ago

Unintended Linking

I have tried to link the new header by,

(1) compiler/luci/pass/CMakeLists.txt

+ target_link_libraries(luci_pass_test luci_testhelper)

(2) compiler/luci/pass/src/QuantizedModelVerifier.test.cpp

- #include "test/TestIOGraph.h"
+ #include <luci/test/TestIOGraph.h>

However it didn't work and constantly linker finds the previous one.

(gdb) break luci::test::TestIOGraph::init
Breakpoint 4 at 0x555555687ec7: file /nfs/git/ONE/compiler/luci/pass/src/test/TestIOGraph.h, line 154.
dayo09 commented 3 years ago

Specifying relative path to the header

compiler/luci/pass/src/QuantizedModelVerifier.test.cpp

+#include "../../testhelper/include/luci/test/TestIOGraph.h"

However it's still using the old header 😭

[ RUN      ] QuantizedModelVerifierTest.Logistic
debug 1
src/test/TestIOGraph.h ////// I put the file name printing for debug
debug 2
[1]    29619 segmentation fault (core dumped)  ./build/compiler/luci/pass/luci_pass_test 

Removing the old header

I have removed "test/TestIOGraph.h" and the test files using it, and the linking and the test went successfully.

@seanshpark May I quickly replace the header with the new one and delete the file?

mhs4670go commented 3 years ago

@dayo09 Could you try it one more time after removing build/CMakeCache.txt?

mhs4670go commented 3 years ago

BTW, when I apply above patch, the test failed.

diff --git a/compiler/luci/pass/CMakeLists.txt b/compiler/luci/pass/CMakeLists.txt
index 42ba814f..2977fbed 100644
--- a/compiler/luci/pass/CMakeLists.txt
+++ b/compiler/luci/pass/CMakeLists.txt
@@ -27,4 +27,5 @@ GTest_AddTest(luci_pass_test ${TESTS})
 target_include_directories(luci_pass_test PRIVATE src)
 target_link_libraries(luci_pass_test luci_pass)
 target_link_libraries(luci_pass_test luci_lang)
+target_link_libraries(luci_pass_test luci_testhelper)
 #target_link_libraries(luci_pass_test oops)
diff --git a/compiler/luci/pass/src/QuantizedModelVerifier.test.cpp b/compiler/luci/pass/src/QuantizedModelVerifier.test.cpp
index 1c854a70..64392d31 100644
--- a/compiler/luci/pass/src/QuantizedModelVerifier.test.cpp
+++ b/compiler/luci/pass/src/QuantizedModelVerifier.test.cpp
@@ -18,7 +18,7 @@

 #include "luci/Pass/QuantizeWithMinMaxPass.h"

-#include "test/TestIOGraph.h"
+#include <luci/test/TestIOGraph.h>
[----------] 1 test from QuantizeWithMinMaxPassTest (0 ms total)

[----------] 24 tests from QuantizedModelVerifierTest
[ RUN      ] QuantizedModelVerifierTest.Logistic
luci_pass_test: /home/seongwoo/ONE/compiler/loco/src/IR/Use.cpp:29: void loco::Use::node(loco::Node*): Assertion `_node->_uses.find(this) != _node->_uses.end()' failed.

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.11 sec

The following tests FAILED:
     23 - luci_pass_test (Child aborted)
Errors while running CTest
dayo09 commented 3 years ago

@mhs4670go That is the first issue I encountered. I just guess that some of their symbols are mixed. If one header is linked without the other one, the problem disappears.

(ADDED) So to explain, it seems that your bracket include didn't work and again, linked to the old header. If you do with relative path (#include "../../testhelper/include/luci/test/TestIOGraph.h") the test passes.

I still don't have clue about why, but it seems that the bracket is doing wrong with something(maybe mixing up symbols of those two headers), resulting in segfault...

dayo09 commented 3 years ago

@mhs4670go Your comment worked :-D

@dayo09 Could you try it one more time after removing build/CMakeCache.txt?

#include "../../testhelper/include/luci/test/TestIOGraph.h"
testhelper/include/luci/test/TestIOGraph.h
DBG 11
DBG 12
DBG 13 / /Some debug printing of the header `testhelper/include/luci/test/TestIOGraph.h`

[----------] Global test environment tear-down
[==========] 134 tests from 50 test cases ran. (21 ms total)
[  PASSED  ] 134 tests.
mhs4670go commented 3 years ago

@dayo09 Ummm. That sounds good but I still have same problem. I think this is because two header(old one and new one in testhelper) have same namespace. When I change the namespace of old one to sth different, this error has gone. Maybe you can solve this issue and gradually remove old one from the repo with changing the name of namespace.

dayo09 commented 3 years ago

@mhs4670go That makes sense :-D

@seanshpark It has created the segfault too. It seems that cmake's symbol linking depends on the previous build so it succeeded when we tried together :-D

[----------] 21 tests from QuantizedModelVerifierTest
[ RUN      ] QuantizedModelVerifierTest.Logistic
[1]    16218 segmentation fault (core dumped)  ./build/compiler/luci/pass/luci_pass_test
seanshpark commented 3 years ago

Maybe you can solve this issue and gradually remove old one from the repo with changing the name of namespace.

This is what I thought to advise you when you get stuck...