Closed markand closed 2 years ago
Hi @markand,
I unfortunately don't have access to macOS to test this but it looks like you should be able to reproduce this issue even without Rexo at all. It seems to be caused by the usage of -fsanitize=address
and can be alleviated using MallocNanoZone=0
as shown here: https://stackoverflow.com/questions/64126942/malloc-nano-zone-abandoned-due-to-inability-to-preallocate-reserved-vm-space
Does that help?
The warning that you have seen about vm space is completely irrelevant and always there even on an empty main program. The global-buffer-overflow
happens in rexo code.
I've tried to debug the problem and it seems that this loop goes beyond the data because test_case_count is incremented twice (which is also what the sanitizer tells).
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x000000010000862e a.out`rx_enumerate_test_cases(test_case_count=0x00007ff7bfeff080, test_cases=0x0000000000000000) at rexo.h:3123:44
3120 for (c_it = RXP_TEST_CASE_SECTION_BEGIN;
3121 c_it != RXP_TEST_CASE_SECTION_END;
3122 ++c_it) {
-> 3123 *test_case_count += (rx_size)(*c_it != NULL);
3124 }
Thanks for looking into this @markand!
At first, I've tried stepping into your test file with a debugger but couldn't find any issue with my config (Ubuntu + Clang 16)—the test_case_count
variable is only incremented once, as expected.
I've added the -fsanitize=undefined,address
flag to the CMakeLists.txt
file and pushed that into a branch, which triggered GitHub actions and... they're failing on Ubuntu + Clang 11, so that's a start!
I've then managed to reproduce the issue on my machine using Clang 10, with the following repro:
cd /path/to/rexo
mkdir build
cd build
cmake -D CMAKE_C_COMPILER=clang-10 .. && cmake --build . --config Debug --target test-empty && ctest -C Debug --output-on-failure -R empty
However, if I change the compiler to be clang-12
, clang-14
, or clang-16
, then the error is gone... could this issue be related to a bug in how fsanitize
is implemented in older versions of Clang?
Which version of Clang are you using? Are you able to reproduce the issue with newer versions?
It's also possible that they've changed how custom data sections are meant to be used, in which case we might need to provide a different implementation for the code below depending on Clang's version.
Update: here's a stripped down repro.
#include <stdio.h>
struct foo
{
int value;
};
/* Implementation Details */
/* -------------------------------------------------------------------------- */
#if defined(_MSC_VER)
__pragma(section("bar$a", read))
__pragma(section("bar$b", read))
__pragma(section("bar$c", read))
__declspec(allocate("bar$a"))
const struct foo * const bar_begin = NULL;
__declspec(allocate("bar$c"))
const struct foo * const bar_end = NULL;
#define DEFINE_SECTION \
__declspec(allocate("bar$b"))
#elif defined(__APPLE__)
extern const struct foo * const
__start_bar __asm("section$start$__DATA$bar");
extern const struct foo * const
__stop_bar __asm("section$end$__DATA$bar");
#define DEFINE_SECTION \
__attribute__((used,section("__DATA,bar")))
DEFINE_SECTION
static const struct foo * const dummy = NULL;
#elif defined(__unix__)
extern const struct foo * const __start_bar;
extern const struct foo * const __stop_bar;
#define DEFINE_SECTION \
__attribute__((used,section("bar")))
DEFINE_SECTION
static const struct foo * const dummy = NULL;
#endif
/* Public API */
/* -------------------------------------------------------------------------- */
#if defined(_MSC_VER)
#define SECTION_BEGIN \
(&bar_begin + 1)
#define SECTION_END \
(&bar_end)
#elif defined(__unix__) || defined(__APPLE__)
#define SECTION_BEGIN \
(&__start_bar)
#define SECTION_END \
(&__stop_bar)
#endif
#define REGISTER_FOO(id, value) \
static const struct foo id = { value }; \
DEFINE_SECTION \
const struct foo * const id##_ptr = &id
/* Usage */
/* -------------------------------------------------------------------------- */
REGISTER_FOO(a, 123);
int
main(
void
)
{
const struct foo * const *it;
for (it = SECTION_BEGIN; it < SECTION_END; ++it)
{
if (*it != NULL)
{
printf("%d\n", (*it)->value);
}
}
return 0;
}
It looks like we're not the first ones to have run into this issue, see for example https://github.com/google/sanitizers/issues/1028.
The workaround is to... disable the address sanitizer for user data sections, which I've just done in the commit https://github.com/christophercrouzet/rexo/commit/31fa8b114c963e311f078e0f663095b4a254d498 :heavy_check_mark:
Thanks again for your help, @markand!
Hi, when running without any test content, the library seems to do a buffer overflow (no code outside of rexo is involved)
The content of the test file is:
Compiled on macOS using
-fsanitize=address,undefined
.