RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.92k stars 1.99k forks source link

periph_rtt: rtt_set_alarm() blocks IRQ for 80 plus usec on STM32 #19520

Open Enoch247 opened 1 year ago

Enoch247 commented 1 year ago

Description

When the ztimer_msec module is enabled and backed by periph_rtt on a STM32 CPU I noticed jitter in my ISRs of 100-200 usec. Digging into the issue I see the implementation of rtt in all STM32 (STM32F1 being the exception I believe, but it may have similar problems) busy waits inside of a critical section of code where IRQs are disabled. This happens inside the rtt_set_alarm() function. I thought to move the busy wait out of the critical section, but I believe ztimeris calling this function from an ISR anyway.

Steps to reproduce the issue

Here is the simplest program I can think of to demonstrate the issue:

Makefile:

APPLICATION = periph_rtt_bug
BOARD ?= nucleo-f767zi
RIOTBASE ?= ../../lib/RIOT

FEATURES_REQUIRED += periph_rtt
USEMODULE += ztimer_usec
USEMODULE += ztimer_stopwatch

# uncomment this line to demonstrate bug indirectly via ztimer rather than directly via rtt
#USEMODULE += ztimer_msec

include $(RIOTBASE)/Makefile.include

main.c:

#include <periph/rtt.h>
#include <ztimer.h>
#include <ztimer/stopwatch.h>

static void _cb(void* arg)
{
    (void)arg;
}

int main(void)
{
    unsigned max = 0;
    ztimer_stopwatch_t stopwatch;
    ztimer_t timer = { .callback = &_cb };

#ifndef MODULE_ZTIMER_MSEC

    // Normally ZTIMER_MSEC is backed by rtt, but if ZTIMER_MSEC isn't used,
    // we'll need to init rtt ourselves.
    rtt_init();

#endif

    ztimer_stopwatch_init(ZTIMER_USEC, &stopwatch);

    while (1)
    {
        ztimer_stopwatch_start(&stopwatch);

#ifdef MODULE_ZTIMER_MSEC

        // Demonstrate that setting a timer on ZTIMER_MSEC (when backed by rtt)
        // can take more time than we might expect. This is espesially bad when
        // timers are set from an ISR.
        ztimer_set(ZTIMER_MSEC, &timer, 100);

#else

        // silence unsused variable compiler warning
        (void)timer;

        // Demonstate that we spend an unexpectedly long amount of time  in this
        // call (with IRQs disabled).
        rtt_set_alarm(100, &_cb, NULL);

#endif

        ztimer_stopwatch_stop(&stopwatch);
        const unsigned time = ztimer_stopwatch_measure(&stopwatch);

        if (time > max)
        {
            max = time;
            printf("max: %u\n", max);
        }
    }

    return 0;
}

Expected results

Output-ed max values from sample code above should be very small.

Actual results

Output-ed max values from sample code above are 80-90 usec. When the line USEMODULE += ztimer_msec is uncomment in the Makefile, that increases to as much as 183 usec.

Versions

RIOT Version 2023.04

Enoch247 commented 1 year ago

possible related/impacted issues? #18883 and #17060

Enoch247 commented 1 year ago

One thought I had was to simply not have ZTIMER_MSEC be backed by periph_rtt (by default) when using STM32 family of devices. If a user overrides the default by specifying to use the module ztimer_periph_rtt, then we can still print a warning, that can by explicitly silenced by users who really know what they are doing.

kaspar030 commented 1 year ago

... so another RTT peripheral that's difficult to use properly. :( The problem here is that usually only RTTs allow deep-sleep (low-power consumption when idle). I'll put this on the agenda for the next VMA.

maribu commented 1 year ago

I think the issue at hand here can be fixed by just moving the busy wait out of the critical section. However, the current behavior seems to provide limited thread safety.

I think the unwritten rule was that thread safety is the concern of the layers above periph, unless the periph API states otherwise. Maybe I could interest @chrysn in this topic; I think such issues might be one of his pet peeves. IMO RIOT would greatly profit from explicit documentation which API is thread-safe an where the users need to manage synchronization themselves.

Assuming that RTT is indeed not thread-safe and users (such as ztimer) have to synchronize RTT API access, just moving the busy wait below the irq_restore() should be just fine.

kaspar030 commented 1 year ago

Assuming that RTT is indeed not thread-safe and users (such as ztimer) have to synchronize RTT API access, just moving the busy wait below the irq_restore() should be just fine.

yes. but those functions get also called from within (timer-)ISRs.

Enoch247 commented 1 year ago

@kaspar030 is correct. Moving the busy wait out of the critical section doesn't help, when called from an ISR (which ztimer does). I also tried moving the busy wait to before the critical section. The intent was to busy wait only if the previous call hadn't synced up yet. However, as I recall, ztimer often makes two calls very quickly (when extending clocks out to 32 bits when hardware doesn't support it), so it didn't completely fix things.

chrysn commented 1 year ago

Explicitness on thread safety is indeed something I like, but for this issue I'll have to pass.