ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.09k stars 231 forks source link

[Tests]All tests in the same test suite must use the same test fixture class #3202

Closed xinlipn closed 3 months ago

xinlipn commented 3 months ago

miopen_gtest failed on Navi3x, MI200 etc with the following message:

[ ERROR ] /opt/rocm/cget/build/tmp-c4f1a42bcee536579a1248d25169f834201170c187fc6d80e8c510a7d28f6f48/googletest-1.14.0/googletest/src/gtest.cc:2788:: /home/bharriso/Source/MIOpen_WIP/MIOpen/test/gtest/conv_3d.cpp:117: Attempted redefinition of test suite GPU_conv3d_FP32. All tests in the same test suite must use the same test fixture class. However, in test suite GPU_conv3d_FP32, you tried to define a test using a fixture class different from the one used earlier. This can happen if the two fixture classes are from different namespaces and have the same name. You should probably rename one of the classes to put the tests into different test suites. Aborted (core dumped)

Ticket: https://ontrack-internal.amd.com/projects/LWPMIOPEN/issues/LWPMIOPEN-1023

Log with fix namespacefix.log

BrianHarrisonAMD commented 3 months ago

@xinlipn the name conflict was the same issue as we were seeing on Navi? Just a bit surprised since the logs I saw from Navi seemed to not print out an error, and just curious if you know why.

Otherwise, looks good to me!

xinlipn commented 3 months ago

@xinlipn the name conflict was the same issue as we were seeing on Navi? Just a bit surprised since the logs I saw from Navi seemed to not print out an error, and just curious if you know why.

Otherwise, looks good to me!

@BrianHarrisonAMD , Thanks for the comments. The unsupported ASIC is a different issue.

Running main() from /opt/rocm/cget/build/tmp-c4f1a42bcee536579a1248d25169f834201170c187fc6d80e8c510a7d28f6f48/googletest-1.14.0/googletest/src/gtest_main .cc /MIOpen/test/gtest/api_convbiasactiv.cpp:189: Skipped Skipping fusion test on unsupported ASIC

PRNG seed: 12345678 [==========] Running 8939 tests from 394 test suites. [----------] Global test environment set-up. Skipping fusion test on unsupported ASIC

[----------] Global test environment tear-down [==========] 8939 tests from 394 test suites ran. (0 ms total) [ PASSED ] 8939 tests.

YOU HAVE 133 DISABLED TESTS

BrianHarrisonAMD commented 3 months ago

@xinlipn the name conflict was the same issue as we were seeing on Navi? Just a bit surprised since the logs I saw from Navi seemed to not print out an error, and just curious if you know why. Otherwise, looks good to me!

@BrianHarrisonAMD , Thanks for the comments. The unsupported ASIC is a different issue.

Running main() from /opt/rocm/cget/build/tmp-c4f1a42bcee536579a1248d25169f834201170c187fc6d80e8c510a7d28f6f48/googletest-1.14.0/googletest/src/gtest_main .cc /MIOpen/test/gtest/api_convbiasactiv.cpp:189: Skipped Skipping fusion test on unsupported ASIC

PRNG seed: 12345678 [==========] Running 8939 tests from 394 test suites. [----------] Global test environment set-up. Skipping fusion test on unsupported ASIC

[----------] Global test environment tear-down [==========] 8939 tests from 394 test suites ran. (0 ms total) [ PASSED ] 8939 tests.

YOU HAVE 133 DISABLED TESTS

Are we going to tackle the Navi issues in a different PR? If so, then I think this is good to go!

xinlipn commented 3 months ago

@xinlipn the name conflict was the same issue as we were seeing on Navi? Just a bit surprised since the logs I saw from Navi seemed to not print out an error, and just curious if you know why. Otherwise, looks good to me!

@BrianHarrisonAMD , Thanks for the comments. The unsupported ASIC is a different issue. Running main() from /opt/rocm/cget/build/tmp-c4f1a42bcee536579a1248d25169f834201170c187fc6d80e8c510a7d28f6f48/googletest-1.14.0/googletest/src/gtest_main .cc /MIOpen/test/gtest/api_convbiasactiv.cpp:189: Skipped Skipping fusion test on unsupported ASIC PRNG seed: 12345678 [==========] Running 8939 tests from 394 test suites. [----------] Global test environment set-up. Skipping fusion test on unsupported ASIC [----------] Global test environment tear-down [==========] 8939 tests from 394 test suites ran. (0 ms total) [ PASSED ] 8939 tests. YOU HAVE 133 DISABLED TESTS

Are we going to tackle the Navi issues in a different PR? If so, then I think this is good to go!

@BrianHarrisonAMD yes, there will be another PR

CAHEK7 commented 3 months ago

I would add a simple verification step that runs miopen_gtest --gtest_list_tests and check it does not crash and return proper exit code.

xinlipn commented 3 months ago

I would add a simple verification step that runs miopen_gtest --gtest_list_tests and check it does not crash and return proper exit code.

@CAHEK7 , here's a log with miopen_gtest --gtest_list_tests on MI200 node. miopen_gtest didn't crash gtest_list_tests_mi200.log

CAHEK7 commented 3 months ago

I would add a simple verification step that runs miopen_gtest --gtest_list_tests and check it does not crash and return proper exit code.

@CAHEK7 , here's a log with miopen_gtest --gtest_list_tests on MI200 node. miopen_gtest didn't crash gtest_list_tests_mi200.log

I mean it's nice to have this check in our CI to avoid this problem in the future. (at least until we finish the work needed for single binary regular runs in our CI) Just a regression test.