RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.84k stars 1.97k forks source link

[tracking] fixing stdio #20361

Open maribu opened 5 months ago

maribu commented 5 months ago

From the matrix: "stdio in RIOT is a hot mess". So let's fix this. This lists the tasks needed for sane stdio

maribu commented 3 months ago
maribu commented 2 months ago

Context

It seems that https://github.com/RIOT-OS/RIOT/issues/20067 was only the tip of the iceberg. Even after stdio initialization done, there can still be races in newlib that can corrupt memory and, therefore, cause "interesting surprises" at runtime (e.g. the hardfault I observed in https://github.com/RIOT-OS/RIOT/pull/19926#issuecomment-2102347336).

The Issue

Possible Solutions

Implement Reentrancy in Newlib

The most straight forward implementation would be to just implement the locking interface newlib requires for reentrancy. This has some downsides, though:

Use a Non-Blocking and Thread-Safe stdio + Thread-Safe Backends

Use a Non-Blocking and Thread-Safe stdio + a stdio Thread

chrysn commented 2 months ago

no more DEBUG(), puts() or printf() from IRQ context (as locking will not work)

As I understood, that was always the guidance, accompanied by "if you know very precisely what your building blocks are, it may still work", which is a caveat people chose to ignore.

maribu commented 2 months ago

I am aware. I would be in favour to forbid use of stdio from IRQ outright.

Maybe we now have a critical mass of people to push for that again, though.

chrysn commented 2 months ago

So far there was always someone who said that "yeah but it works in my cases"; IIRC from the semantics and fixes PR, last time that was @benpicco.

As I understand this issue, here it's not even about RIOT's stdio (stdio_write etc) having weird properties, but on top of that, the C library functions have non-reentrancy that makes things worse even if the individual stdio impl is good. So maybe we should clarify whether we talk about libc provided functions (printf etc) or RIOT provided functions (stdio_write) in this issue here. Then we're in a better situation to decide whether those cases are relevant for this issue.

maribu commented 2 months ago

So far there was always someone who said that "yeah but it works in my cases"; IIRC from the semantics and fixes PR, last time that was @benpicco.

Also @kaspar030 was "over my dead body" regarding no stdio from IRQ context. But I think he may no longer care.

So maybe we should clarify whether we talk about libc provided functions (printf etc) or RIOT provided functions (stdio_write)

We should, but we also need to keep both in mind. E.g. newlib can provide reentrancy (and will work fine with that - at the cost of higher latency and higher memory consumption). But without reentrancy, newlib's stdio is racy and will occasionally crash.

On the other hand stdio_uart is not thread safe and would need something like newlib's reentrancy feature to prevent concurrent calls to stdio_write().

I think it would absolute be possible to write C lib stdio (printf(), puts(), ...) that is thread safe but could allow another thread to call puts() without blocking even if it just preempted a thread in the middle of puts(). But that would require stdio_write() to also be thread-safe and non-exlusive to work as we would like (e.g. it would mix the output from the threads, but not loose chars). (I think most non-DMA UART drivers will be almost up for the task anyway and may only need a sync before writing the first byte added in case the hardware is still busy shifting out a byte from the thread that got just preempted.) Then we would have the current behavior, except for the occasional crashes fixed.

benpicco commented 2 months ago

I still think we don't need to worry about stdio too much. It's mainly a debug tool - in 99% of all RIOT deployments, nobody is going to see stdio ever again. And for debugging, stdio from ISR can be useful, e.g. just to print that some code path has bean reached.

maribu commented 2 months ago

I agree that stdio is mostly a debug tool. Still, random crashes due to stdio is a pita. (Admittedly, the chance of a crash due to a race is low, but it is still something we need to fix. And there are firmwares that get the timing just right for a reliable crash on startup.)

chrysn commented 2 months ago

It may be a debug tool, but it's also one of the earliest things people are introduced to with RIOT; judging from the various telnet-ish and Bluetooth-ish transports that are around, apparently the shell is pretty widely used. (And race conditions are not something we should have in our code base)

maribu commented 2 months ago

It is also not only newlib's stdio that is not thread-safe, but also avrlibc's. E.g.

int
printf_P(const char *fmt, ...)
{
    va_list ap;
    int i;

    va_start(ap, fmt);
    stdout->flags |= __SPGM;
    i = vfprintf(stdout, fmt, ap);
    stdout->flags &= ~__SPGM;
    va_end(ap);

    return i;
}

So when thread A is doing printf_P() to print from PROGMEM and thread B preempts thread A, the address of the format string in a call to printf() from A will be treated as referring to PROGMEM rather than as referring to RAM.

I now imagine in thousand of years archeologists debating the reasoning behind the design choice to set a flag in a FILE * to encode something completely unrelated such as the address space of a format string.