Open maribu opened 5 months ago
It seems that https://github.com/RIOT-OS/RIOT/issues/20067 was only the tip of the iceberg. Even after stdio initialization done, there can still be races in newlib that can corrupt memory and, therefore, cause "interesting surprises" at runtime (e.g. the hardfault I observed in https://github.com/RIOT-OS/RIOT/pull/19926#issuecomment-2102347336).
printf()
and puts()
The most straight forward implementation would be to just implement the locking interface newlib requires for reentrancy. This has some downsides, though:
DEBUG()
, puts()
or printf()
from IRQ context (as locking will not work)stdio
+ Thread-Safe BackendsUART
still is not thread-safe
periph_uart
implementationDEBUG()
/ printf()
/ puts()
etc. we would need non-blocking thread-safe versions
stdio
+ a stdio
Threadstdio
implementation that just writes to a ringbufferDEBUG()
from IRQ would mostly work (when there is space in the ringbuffer without having to block for the even thandler to drain something)hello-world
and a ring bufferno more DEBUG(), puts() or printf() from IRQ context (as locking will not work)
As I understood, that was always the guidance, accompanied by "if you know very precisely what your building blocks are, it may still work", which is a caveat people chose to ignore.
I am aware. I would be in favour to forbid use of stdio from IRQ outright.
Maybe we now have a critical mass of people to push for that again, though.
So far there was always someone who said that "yeah but it works in my cases"; IIRC from the semantics and fixes PR, last time that was @benpicco.
As I understand this issue, here it's not even about RIOT's stdio (stdio_write etc) having weird properties, but on top of that, the C library functions have non-reentrancy that makes things worse even if the individual stdio impl is good. So maybe we should clarify whether we talk about libc provided functions (printf etc) or RIOT provided functions (stdio_write) in this issue here. Then we're in a better situation to decide whether those cases are relevant for this issue.
So far there was always someone who said that "yeah but it works in my cases"; IIRC from the semantics and fixes PR, last time that was @benpicco.
Also @kaspar030 was "over my dead body" regarding no stdio from IRQ context. But I think he may no longer care.
So maybe we should clarify whether we talk about libc provided functions (printf etc) or RIOT provided functions (stdio_write)
We should, but we also need to keep both in mind. E.g. newlib can provide reentrancy (and will work fine with that - at the cost of higher latency and higher memory consumption). But without reentrancy, newlib's stdio is racy and will occasionally crash.
On the other hand stdio_uart
is not thread safe and would need something like newlib's reentrancy feature to prevent concurrent calls to stdio_write()
.
I think it would absolute be possible to write C lib stdio (printf()
, puts()
, ...) that is thread safe but could allow another thread to call puts()
without blocking even if it just preempted a thread in the middle of puts()
. But that would require stdio_write()
to also be thread-safe and non-exlusive to work as we would like (e.g. it would mix the output from the threads, but not loose chars). (I think most non-DMA UART drivers will be almost up for the task anyway and may only need a sync before writing the first byte added in case the hardware is still busy shifting out a byte from the thread that got just preempted.) Then we would have the current behavior, except for the occasional crashes fixed.
I still think we don't need to worry about stdio too much. It's mainly a debug tool - in 99% of all RIOT deployments, nobody is going to see stdio ever again. And for debugging, stdio from ISR can be useful, e.g. just to print that some code path has bean reached.
I agree that stdio is mostly a debug tool. Still, random crashes due to stdio is a pita. (Admittedly, the chance of a crash due to a race is low, but it is still something we need to fix. And there are firmwares that get the timing just right for a reliable crash on startup.)
It may be a debug tool, but it's also one of the earliest things people are introduced to with RIOT; judging from the various telnet-ish and Bluetooth-ish transports that are around, apparently the shell is pretty widely used. (And race conditions are not something we should have in our code base)
It is also not only newlib's stdio that is not thread-safe, but also avrlibc's. E.g.
int
printf_P(const char *fmt, ...)
{
va_list ap;
int i;
va_start(ap, fmt);
stdout->flags |= __SPGM;
i = vfprintf(stdout, fmt, ap);
stdout->flags &= ~__SPGM;
va_end(ap);
return i;
}
So when thread A is doing printf_P()
to print from PROGMEM
and thread B preempts thread A, the address of the format string in a call to printf()
from A will be treated as referring to PROGMEM
rather than as referring to RAM.
I now imagine in thousand of years archeologists debating the reasoning behind the design choice to set a flag in a FILE *
to encode something completely unrelated such as the address space of a format string.
From the matrix: "stdio in RIOT is a hot mess". So let's fix this. This lists the tasks needed for sane stdio
stdio_uart
stdio_semihosting
stdio_rtt
stdio_cdc_acm
stdio_nimble
stdio_udp
stdio_telnet
stdio_usb_serial_jtag
stdio_native
stdio_ethos
slipdev_stdio