`CONFIG_SKIP_BOOT_MSG` makes newlib stdio races more likely

mchesser commented 1 year ago

After spending some time analyzing this issue, I believe that the actual cause is already known -- i.e., #4488 / #8619-comment-569952641 (and is actually documented in the release notes).

Feel free to close as a duplicate, but I thought I would leave the analysis below regarding the impact of CONFIG_SKIP_BOOT_MSG.

Description

The reentrant stdio functions in newlib initialize some shared data (via the function __sinit) the first time an IO function, e.g., puts, is used. Normally, before main, RIOT prints a message using puts (i.e., "This is RIOT" + version), which ends up calling __sinit before user code is executed, avoiding most of the issues with initialization.

However if CONFIG_SKIP_BOOT_MSG is set, the __sinit will not be called until the first print statement. If two threads attempt to print at a roughly similar time, then it is possible¹ for the second thread to execute with a partially initialized reent object which causes various crashes depending on how much of the structure has been initialized.

_{1. Platforms that do not perform any locking as part of _lock_acquire (see: #8619-comment-569952641).}

As an example, if we have code configured like this:

USEMODULE += ztimer_usec
USEMODULE += sched_round_robin
CFLAGS += -DCONFIG_SKIP_BOOT_MSG
BOARD=nucleo-f446re

That spawns multiple threads that print -- e.g.:

main.c

```cpp #include #include #include "thread.h" #include "ztimer.h" #include "timex.h" uint32_t WORKER1_SPIN_TIME_US = (10 * US_PER_MS) - 30; // Approximate RR scheduling time - 30 usecs. volatile uint32_t WORKER1_LOOP_SPIN = 5; // Loop iterations for sub-microsecond adjustment. void* worker1(void* arg) { (void)arg; ztimer_spin(ZTIMER_USEC, WORKER1_SPIN_TIME_US); while (WORKER1_LOOP_SPIN != 0) { WORKER1_LOOP_SPIN -= 1; } puts("WORKER 1\n"); while(1) {}; } void* worker2(void* arg) { (void)arg; puts("WORKER 2\n"); while(1) {}; } #define WORKER_STACKSIZE (THREAD_STACKSIZE_TINY+THREAD_EXTRA_STACKSIZE_PRINTF) int main(void) { static char w1_stack[WORKER_STACKSIZE]; thread_create(w1_stack, sizeof(w1_stack), 7, THREAD_CREATE_STACKTEST, worker1, NULL, "T1"); static char w2_stack[WORKER_STACKSIZE]; thread_create(w2_stack, sizeof(w2_stack), 7, THREAD_CREATE_STACKTEST, worker2, NULL, "T2"); return 0; } ```

Then if worker1 is preempted in the middle of stdio initialization, e.g. at:

worker1 at main.c:27
  _puts_r
    __sinit
      __sfp
        __sfmoreglue
          ...
    <TIMER 1 IRQ>
    <PREEMPTION>

Then worker2 may crash when puts is called:

worker2 at main.c:39
  _puts_r
    __swsetup_r+52

maribu commented 1 year ago

Thanks for reporting this. In https://github.com/RIOT-OS/RIOT/pull/8619 it is mentioned that the nano variant of newlib behaves differently, would you mind to check which version is used? Can you reproduce the issue in both cases? (BUILD_IN_DOCKER=1 will use a toolchain that has both nano and non-nano variant of the newlib.)

(Note that RIOT will by default use the nano version, but only if that version is available.)

In a different case were racy code did not result in garbled stdio (there seems to be a consensus that this is acceptable in most use cases), but in crashes, we opted to make this thread-safe no matter what. (This was for memory allocation.) So the consistent thing here would be to make sure that __sinit is called prior before calling main().

Also, it would be nice to also provide at least the option to have newlib thread-safe even for functions that do not crash when racing. Some people do like e.g. clean stdio output enough to spend resources on that :)

mchesser commented 1 year ago

Took me a while to get back to this, but I've managed to reproduce with a docker based build of the target. Based on the output of test/sys/libc_newlib and the build script output (see below), this build appears to be using newlib-nano.

Compile command

``` arm-none-eabi-gcc \ -DRIOT_FILE_RELATIVE=\"examples/print_race/main.c\" \ -DRIOT_FILE_NOPATH=\"main.c\" \ -DCONFIG_SKIP_BOOT_MSG -Werror -DCPU_FAM_STM32F4 -DSTM32F446xx -DCPU_LINE_STM32F446xx -DSTM32_FLASHSIZE=524288U -D__SYSTEM_STM32F4XX_H -DSYSTEM_STM32F4XX_H -mno-thumb-interwork -mcpu=cortex-m4 -mlittle-endian -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -ffunction-sections -fdata-sections -fshort-enums -ggdb -g3 -Os -DCPU_MODEL_STM32F446RE -DCPU_CORE_CORTEX_M4F -DCPU_HAS_BACKUP_RAM=1 -DRIOT_APPLICATION=\"print_race\" -DBOARD_NUCLEO_F446RE=\"nucleo-f446re\" -DRIOT_BOARD=BOARD_NUCLEO_F446RE -DCPU_STM32=\"stm32\" -DRIOT_CPU=CPU_STM32 -DMCU_STM32=\"stm32\" -DRIOT_MCU=MCU_STM32 -std=c11 -fwrapv -Wstrict-overflow -fno-common -ffunction-sections -fdata-sections -Wall -Wextra -Wmissing-include-dirs -DNDEBUG -fno-delete-null-pointer-checks -fdiagnostics-color -Wstrict-prototypes -Wold-style-definition -gz -Wformat=2 -Wformat-overflow -Wformat-truncation -fmacro-prefix-map=/RIOT/= -Wcast-align -DCPU_RAM_BASE=0x20000000 -DCPU_RAM_SIZE=0x20000 -include '/RIOT/examples/print_race/bin/nucleo-f446re/riotbuild/riotbuild.h' -isystem /opt/gcc-arm-none-eabi-10.3-2021.10/arm-none-eabi/include/newlib-nano -I/RIOT/core/lib/include -I/RIOT/core/include -I/RIOT/drivers/include -I/RIOT/sys/include -I/RIOT/boards/nucleo-f446re/include -I/RIOT/boards/common/nucleo/include -I/RIOT/boards/common/stm32/include -I/RIOT/boards/common/nucleo64/include -I/RIOT/cpu/stm32/include -I/RIOT/build/stm32/cmsis/f4/Include -I/RIOT/cpu/stm32/include/clk -I/RIOT/cpu/cortexm_common/include -I/RIOT/sys/libc/include -I/RIOT/build/pkg/cmsis/CMSIS/Core/Include -I/RIOT/sys/auto_init/include -I/RIOT/examples/print_race/bin/nucleo-f446re/preprocessor -MQ '/RIOT/examples/print_race/bin/nucleo-f446re/application_print_race/main.o' -MD -MP -c -o /RIOT/examples/print_race/bin/nucleo-f446re/application_print_race/main.o /RIOT/examples/print_race/main.c ```

It ended up being quite difficult to get a consistent crash when attached to a debugger. I think this is partly because some of the data allocated as part of stdio initialization is within 'uninitialized' RAM, which happens to be initialized with data from any previous execution.

The method I ended with, was to try and force the firmware to call a null pointer in worker2 by carefully adjusting the delay to ensure that preemption occurs after zeroing the stdout structure and some initial setup, but before the function pointer for write is initialized. Checks on the stdout FILE structure that occur as part of puts (e.g., SWR must be set in flags), mean that preemption needs to within this region of code: findfp.c:60-67.

The null call then occurs as part of worker2->puts->puts_r->__swbuf_r->_fflush_r->__sflush_r at flush.c:195

Getting this exact crash, required sub-microsecond adjustment which I did via a busy loop, however I think other uses of stdio may result in a larger window for the crash.

Although sinit has slightly changed in newer versions newlibc, because of some fairly recent patches (staring from: https://sourceware.org/pipermail/newlib/2022/019283.html). I'm fairly sure the same problem still occurs (e.g., by causing the switch in a similar location)

RIOT-OS / RIOT

`CONFIG_SKIP_BOOT_MSG` makes newlib stdio races more likely #20067

Description