UoB-HPC / SimEng

The University of Bristol HPC Simulation Engine
https://uob-hpc.github.io/SimEng
Apache License 2.0
93 stars 20 forks source link

Regression Test issue when building SimEng : Unit_test SegFault #405

Open elfmath opened 8 months ago

elfmath commented 8 months ago

Goals

As a end-user of SimEng, I wish to build SimEng and pass regression tests before installing it. So that I can trust my installation

Issue

"unit_tests" have a "SEGMENTATION FAULT" when executed.

From shell when running :

cmake --build $SIMENG_BUILD_DIR --target test 
    Start 1: unit_tests                                                                                                                                                                   │
1/4 Test #1: unit_tests .......................***Exception: SegFault  0.68 sec     

When looking in the LastTest.log we have this :

[ RUN      ] PipelineFetchUnitTests/PipelineFetchUnitTest.invalidMinBytesreadsDontComplete/0

GMOCK WARNING:
Uninteresting mock function call - returning default value.
    Function call: getMaxInstructionSize()
          Returns: '\0'
NOTE: You can safely ignore the above warning unless this call should not happen.  Do not suppress it by blindly adding an EXPECT_CALL() if you don't mean to enforce the call.  See https://github.com/google/googletest/blob/master/docs/gmock_cook_book.md#knowing-when-to-expect for details.

GMOCK WARNING:
Uninteresting mock function call - returning default value.
    Function call: getMaxInstructionSize()
          Returns: '\0'
NOTE: You can safely ignore the above warning unless this call should not happen.  Do not suppress it by blindly adding an EXPECT_CALL() if you don't mean to enforce the call.  See https://github.com/google/googletest/blob/master/docs/gmock_cook_book.md#knowing-when-to-expect for details.

GMOCK WARNING:
Uninteresting mock function call - returning directly.
    Function call: clearCompletedReads()
NOTE: You can safely ignore the above warning unless this call should not happen.  Do not suppress it by blindly adding an EXPECT_CALL() if you don't mean to enforce the call.  See https://github.com/google/googletest/blob/master/docs/gmock_cook_book.md#knowing-when-to-expect for details.
<end of output>
Test time =   0.68 sec

Investigation Leads

Seg fault occurs in the test at "invalidMinBytesreadsDontComplete"

lib/SimEng/test/unit/pipeline/FetchUnitTest.cc:673

No information on the Seg fault in the trace but if we remove that test, all unit-tests pass.

Reproduction

Try to build SimEng with :

elfmath commented 8 months ago

I dived into a little bit and found the cause of the segfault. From backtrace of gdb the segfault is coming from FetchUnit::tick() at line 209.

#0  0x00007ffff6ed9ca0 in __memmove_ssse3_back () from /usr/lib64/libc.so.6
#1  0x00007ffff7b6bc29 in simeng::pipeline::FetchUnit::tick (this=0x917928)
    at $HOME/SimEng/src/lib/pipeline/FetchUnit.cc:209
#2  0x0000000000649af3 in simeng::pipeline::PipelineFetchUnitTest_invalidMinBytesreadsDontComplete_Test::TestBody (this=0x917270)
    at $HOME/SimEng/test/unit/pipeline/FetchUnitTest.cc:738
[...]

In fact at line 209 we execute this instruction : std::memmove(fetchBuffer_, buffer + bufferOffset, bufferedBytes_); So we want to move memory from address pointed by buffer to memory pointed by fetchBuffer_. But what is the address inside of buffer ?

At l59 : const uint8_t* buffer;. So buffer is a pointer that point to a const uint8_t variable. By leaving buffer uninitialized we don't control the address inside.

When we call FetchUnit::tick() at lib/SimEng/test/unit/pipeline/FetchUnitTest.cc::738 (SimEng tag0.9.6), we don't go to instruction that change the address inside buffer before calling std::memmove. Thus resulting to a segfault because buffer was init with an address out of our memory map.

Thas is my understanding of the segfault I have on CentOS 7. When I run this on a ubuntu 22.04 machine all tests pass. My guess is that on ubuntu uninitialized const pointer are initizalized with an address inside the working memory, so we can access it but we shouldn't.

I didn't fully understand what FetchUnit::tick() does, could you check if that is normal that buffer is never initiliazed when all conditions are aligned ? I succeded to pass this test by changing buffer declaration from const uint8_t* buffer; to static const uint8_t* buffer;. With that I put buffer into .bss elf section and I have a memory address that I can access. However I artificially solve the issue because it is writting to an address that we don't use.