Open yuhiping opened 9 months ago
Hi, I just found that there is no "-O3" parameter for CMAKE_ASM_FLAGS_RELEASE in arm-none-eabi.cmake file. https://github.com/azure-rtos/threadx/blob/master/cmake/arm-none-eabi.cmake So, is "-O3" forbidden for ASM compilation?
Historically, GCC -O3 optimization did not produce reliable code. I would not recommend using -O3 until/unless the entire ThreadX source is built with -O3 and passes all verification tests. That said, it would be useful to issue a pull request for your proposed change so it can be evaluated further.
This is very cool bug.
Probably calling dummy function from other c file after the TX_RESTORE
will resolve it as the compiler now can't know if the global changed.
Other approach will be to define pool_ptr to be pointer to volatile.
What do you think about it @yuhiping ?
This is indeed an interesting issue. I like using volatile more than calling a dummy function - just from the perspective of performance. In your fix, did you just add volatile to the pool_ptr API parameter?
On Sun, Feb 4, 2024 at 12:12 AM amgross @.***> wrote:
This is very cool bug. Probably calling dummy function from other c file will resolve it as the compiler now can't know if the global changed. Other approach will be to define pool_ptr to be pointer to volatile. What do you think about it @yuhiping https://github.com/yuhiping ?
— Reply to this email directly, view it on GitHub https://github.com/eclipse-threadx/threadx/issues/334#issuecomment-1925625696, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3YCNAN3HI6MPUDD6JQ3YUDYR47GLAVCNFSM6AAAAABA2ZUUZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVGYZDKNRZGY . You are receiving this because you commented.Message ID: @.***>
I didn't tried the fix, but indeed adding the volatile to the parameter is what I thought
This is very cool bug. Probably calling dummy function from other c file after the
TX_RESTORE
will resolve it as the compiler now can't know if the global changed. Other approach will be to define pool_ptr to be pointer to volatile. What do you think about it @yuhiping ?
@amgross @williamelamie Sorry for the late reply. Yes, as you suggested, adding the volatile to the parameter was just our solution. Moreover, Is it still necessary to issue a pull request ?
This is indeed an interesting issue. I like using volatile more than calling a dummy function - just from the perspective of performance. In your fix, did you just add volatile to the pool_ptr API parameter? … On Sun, Feb 4, 2024 at 12:12 AM amgross @.***> wrote: This is very cool bug. Probably calling dummy function from other c file will resolve it as the compiler now can't know if the global changed. Other approach will be to define pool_ptr to be pointer to volatile. What do you think about it @yuhiping https://github.com/yuhiping >
Hey @williamelamie I actually don't think this is the right solution, it exposes the underlying implementation via a public interface and is not very localized. Personally I think the function should make local copies while appropriately managing variable properties (eg: volatile). This doesn't address potentially issues that may arise in SMP, for example a barrier might be needed in SMP designs and the use of VOLATILE for the function prototype won't solve that.
I think a cleaner alternative to keep the changes localized to this function are create a local variable that is volatile and keep the function signature the same
I duplicated this with the same compiler version compiling for M33 (gcc 6.2) as well as GCC 8. The below 2 line solution fixes the issue for me and we don't need to change the public API now. Note that this bug is also present with -O2 as well. With optimizing for size (-Os) or debug (-Og), which are the flags most embedded developers use, the bug is not present.
This is actually a pretty nasty little problem... Here's my fix
UCHAR *_tx_byte_pool_search(TX_BYTE_POOL *pool_ptr_in, ULONG memory_size)
{
<snip>
// add local volatile ptr
TX_BYTE_POOL volatile * pool_ptr = pool_ptr_in;
original assembly
<snip>
total_theoretical_available = pool_ptr -> tx_byte_pool_available + ((pool_ptr -> tx_byte_pool_fragments - 2) * ((sizeof(UCHAR *)) + (sizeof(ALIGN_TYPE))));
/**
----> R7 loaded with the tx_byte_pool_fragments count at function entry, and re-used throughout the whole function call
*/
80092f0: 68c7 ldr r7, [r0, #12]
<snip>
TX_DISABLE
/* Determine if anything has changed in terms of pool ownership. */
if (pool_ptr -> tx_byte_pool_owner != thread_ptr)
8009330: 6a04 ldr r4, [r0, #32]
8009332: 42a5 cmp r5, r4
8009334: d03f beq.n 80093b6 <_tx_byte_pool_search+0xd6>
tx_byte_pool_search.c:270
{
/* Pool changed ownership in the brief period interrupts were
enabled. Reset the search. */
current_ptr = pool_ptr -> tx_byte_pool_search;
examine_blocks = pool_ptr -> tx_byte_pool_fragments + ((UINT) 1);
/**
-----> Bug here... R7 is reused from early and not reloaded
-----> (in the original bug for the Cortex R5 it used the LR, but the bug is the same).
-----> R8 is the variable examine blocks
**/
8009336: f107 0801 add.w r8, r7, #1
<snip>
}
} while(examine_blocks != ((UINT) 0));
800933e: f1b8 0f00 cmp.w r8, #0
and using the local volatile it's fixed
<snip>
TX_DISABLE
{
/* Pool changed ownership in the brief period interrupts were
enabled. Reset the search. */
current_ptr = pool_ptr -> tx_byte_pool_search;
8009330: 6942 ldr r2, [r0, #20]
tx_byte_pool_search.c:270
/**
----> fixed: R3 is now the register holding tx_byte_pool fragments. This correctly re-loads R3 with pool_fragments count from the ptr instead of re-using the register
*/
examine_blocks = pool_ptr -> tx_byte_pool_fragments + ((UINT) 1);
8009332: 68c3 ldr r3, [r0, #12]
tx_byte_pool_search.c:273
/* Setup our ownership again. */
pool_ptr -> tx_byte_pool_owner = thread_ptr;
8009334: 6206 str r6, [r0, #32]
tx_byte_pool_search.c:270
/**
----> add 1 to fragments count in R3 to get the examine_blocks, this is correct now
*/
examine_blocks = pool_ptr -> tx_byte_pool_fragments + ((UINT) 1);
8009336: 3301 adds r3, #1
tx_byte_pool_search.c:275
}
} while(examine_blocks != ((UINT) 0));
8009338: 2b00 cmp r3, #0
800933a: d1eb bne.n 8009314 <_tx_byte_pool_search+0x34>
__set_basepri_value():
I agree Pat. It is nicer to have the use of volatile hidden from the outside.
Another option would be to make the tx_byte_pool_owner structure member in TX_BYTE_POOL volatile, like the following:
struct TX_THREAD_STRUCT volatile
*tx_byte_pool_owner;
This is a slightly smaller change and something that might be good to apply to the head of each suspension list, since they are used by the tx_*_prioritize functions in a similar manner as tx_byte_pool_owner structure.
Is it possible that this is an issue with the cpu-specific port?
If we look at the implementation of TX_RESTORE
in threadx/ports/cortex_r5/gnu/inc/tx_port.h, it is implemented like this:
#define TX_RESTORE asm volatile (" MSR CPSR_c,%0 "::"r" (interrupt_save) );
Other implementations, like the Cortex-M4, have a memory-clobber in the inline-assembly that restores the interrupt-posture:
__asm__ volatile ("MSR PRIMASK,%0": : "r" (int_posture): "memory");
I can't test it right now, but I think it would be interesting to check if adding a memory-clobber in the Cortex-R5 port would fix this. My understanding is that it should prevent the compiler from reusing the cached value from the register.
Is it possible that this is an issue with the cpu-specific port?
Nice observation. Yes, you found the issue, you are correct, the clobbers are wrong and the compiler fence is missing. I'm using BASEPRI for Cortex M33/M4 instead of CPSID, and there's actually a bug in threadX BASEPRI support that triggers the same issue as Cortex R5
#ifdef TX_PORT_USE_BASEPRI
__attribute__( ( always_inline ) ) static inline void __set_basepri_value(UINT basepri_value)
{
__asm__ volatile ("MSR BASEPRI,%0 ": : "r" (basepri_value));
}
#else
Note the missing clobber, so no compiler fence is inserted.
For non-BASEPRI approach (CPSID)
__attribute__( ( always_inline ) ) static inline UINT __disable_interrupts(void)
{
UINT int_posture;
int_posture = __get_interrupt_posture();
#ifdef TX_PORT_USE_BASEPRI
__set_basepri_value(TX_PORT_BASEPRI);
#else
__asm__ volatile ("CPSID i" : : : "memory");
#endif
return(int_posture);
}
The barrier is present.
So the barrier is there for approach using CPSID , but if you use BASEPRI, the memory fence/barrier is missing.
The fix for BASEPRI (verified) is:
#ifdef TX_PORT_USE_BASEPRI
__attribute__( ( always_inline ) ) static inline void __set_basepri_value(UINT basepri_value)
{
__asm__ volatile ("MSR BASEPRI,%0 ": : "r" (basepri_value) : "memory");
}
#else
This fixes the issues for Cortex M33/M4 using BASEPRI. For anyone not using BASEPRI, this won't be an issue on Cortex M33/M4 as the non-BASEPRI path correctly inserts a compiler barrier.
For Cortex R5, TX_DISABLE is defined as
#define TX_DISABLE asm volatile (" MRS %0,CPSR; CPSID if ": "=r" (interrupt_save) );
This is incorrect, the fence is missing. It should be
#define TX_DISABLE asm volatile (" MRS %0,CPSR; CPSID if ": "=r" (interrupt_save) : "memory");
Summary: 2 bugs present in the tx ports.
1) barrier missing in basepri for cortex M 2) barrier missing in TX_DISABLE for cortex R
With the barriers in place, the problem will be gone.
This is actually a big problem, these missing fences can trigger some really hard to find issues.
What with ARC EM, do it need also to be changed? https://github.com/eclipse-threadx/threadx/blob/485a02faec6edccef14812ddce6844af1d7d2eef/ports/arc_em/metaware/inc/tx_port.h#L306-L307
What with ARC EM, do it need also to be changed?
Looks like it's using a compiler built_in, the compiler built_in's will take care of the memory fences for you.
Asm("seti %0" : : "ir" (key) : "memory");
The way we use byte pool
We use byte pool as heap in our project, the detail code in our project was copied from the following link which implemets CMSIS API for threadx. https://github.com/STMicroelectronics/stm32_mw_cmsis_rtos_tx/blob/main/cmsis_os2.c
In cmsis_os2.c, byte pool is created in MemInit() and after that, we can just use MemAlloc() and MemFree() to request and release memorry.
What happened? MemAlloc() and MemFree() just worked fine at most of the time. But occationally, MemAlloc() failed with code of TX_NO_MEMORY . While, when we looked into this issue and after debugged a lot, we found that there was still enough memorry but the alloc algorithm not worked well.
Why this happened? After days of debugging, we found that the value of tx_byte_pool_fragments went wrong when this issue happen. To prove this, we defined a global varible of the same type, and do the same operation when tx_byte_pool_fragments increased or decreased. This is simple because tx_byte_pool_fragments only changes in the function _tx_byte_pool_search() which defined in threadx/threadx/common/src/tx_byte_pool_search.c. It turned out that, when MemAlloc() failed with code of TX_NO_MEMORY , the value of tx_byte_pool_fragments in struct TX_BYTE_POOL was just different with the global varible we defined.
The root cause After a deep dive into the source code of threadx, we found that _tx_byte_pool_search() can be interrupted after TX_RESTORE at line 258. This is a good design to let taskes with high priority alloc from the pool at first. But as mentioned above, The control block or so called handle was defined as global varibles, which may be changed by another high priority task.
Further explanation
Further proof The following is the asm code of _tx_byte_pool_search(), when tx_byte_pool_fragments increases or decreases, it uses the value from the register LR instead of re-load the value from ram which may be changed already. The asm code may be different for the reason of different gcc version or MCU, in our case , the MCU is Cortex-R5 based and gcc version is arm-none-eabi-gcc 6.2.1 .
Looking forward for your reply!