apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.91k stars 1.18k forks source link

arm_cache.c: up_disable_dcache() has undefined behavior in write-back mode #12740

Open danielappiagyei-bc opened 4 months ago

danielappiagyei-bc commented 4 months ago

In armv7-m/arm_cache.c (potentially other arch's arm_cache.c files as well, haven't looked), up_disable_dcache() reads some loop variables from memory, disables d cache, then cleans and invalidates the cache in a loop. When the cache had been configured to operate in WRITE_BACK mode, I am seeing that these variables:

  ccsidr = getreg32(NVIC_CCSIDR);
  sets   = CCSIDR_SETS(ccsidr);          /* (Number of sets) - 1 */
  sshift = CCSIDR_LSSHIFT(ccsidr) + 4;   /* log2(cache-line-size-in-bytes) */
  ways   = CCSIDR_WAYS(ccsidr);          /* (Number of ways) - 1 */

, which are set only once before the do-while loops, have their values changed mid-loop from ways = 3, sets = 255 on the imxrt-1064, to gigantic large numbers like 1 billion, causing the loop to execute incorrectly and causing a crash. This behavior is consistent and reproducible.

I had to look at the NXP-provided code for disabling dcache to help understand the error:

/**
  \brief   Disable D-Cache
  \details Turns off D-Cache
  */
__STATIC_FORCEINLINE void SCB_DisableDCache(void)
{
#if defined(__DCACHE_PRESENT) && (__DCACHE_PRESENT == 1U)
    register uint32_t ccsidr;
    register uint32_t sets;
    register uint32_t ways;

    SCB->CSSELR = 0U; /* select Level 1 data cache */
    __DSB();

    SCB->CCR &= ~(uint32_t)SCB_CCR_DC_Msk; /* disable D-Cache */
    __DSB();

    ccsidr = SCB->CCSIDR;

    /* clean & invalidate D-Cache */
    sets = (uint32_t)(CCSIDR_SETS(ccsidr));
    do
    {
        ways = (uint32_t)(CCSIDR_WAYS(ccsidr));
        do
        {
            SCB->DCCISW = (((sets << SCB_DCCISW_SET_Pos) & SCB_DCCISW_SET_Msk) |
                           ((ways << SCB_DCCISW_WAY_Pos) & SCB_DCCISW_WAY_Msk));
#if defined(__CC_ARM)
            __schedule_barrier();
#endif
        } while (ways-- != 0U);
    } while (sets-- != 0U);

    __DSB();
    __ISB();
#endif

They use the register keyword when declaring the function's local variables to somehow prevent the invalidation and cleaning of dcache from mangling their values. (My understanding is that the compiler isn't guaranteed to put the value in a register, it's just a hint which modern compilers often ignore since they're better at optimization. The keyword is also deprecated in c++.).

Anyway, I modified nuttx's up_dcache_disable() to declare every local variable as volatile, which has the effects of disabling compiler optimizations such as out-of-order execution for that variable. For what it's worth, I was compiling on gcc with -Os optimization level. From cppreference on volatile:

Every access (both read and write) made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine (that is, all writes are completed at some time before the next sequence point). This means that within a single thread of execution, a volatile access cannot be optimized out or reordered relative to another visible side effect that is separated by a sequence point from the volatile access.

This change fixed the bug and stopped my app from crashing. Maybe someone more familiar with cache and c/cpp language can better explain why register and volatile worked? If it's just about preventing reordering of executions by the compiler then maybe instead of volatile, some ARM_ISB() or ARM_DMB()'s would work as well? Haven't tested but just wanted to let you guys know of this issue and that it may be present in other arm arch files.

acassis commented 4 months ago

Hi @danielappiagyei-bc you said it is easy to reproduce, do you have a simple example? Maybe it could be included in the testing ostest to catch this kind of issue.

Maybe @davids5 or @patacongo can have some ideas why register/volatile prevents the issue from happening.

danielappiagyei-bc commented 4 months ago

Will get back to this in a few weeks, appreciate the patience, am busy with work and personal