eclipse-threadx / threadx

Eclipse ThreadX is an advanced real-time operating system (RTOS) designed specifically for deeply embedded applications.
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/threadx/index.md
MIT License
2.96k stars 801 forks source link

Seams that task sheduler handles timeouts incorrectly. #411

Open asamuta opened 1 month ago

asamuta commented 1 month ago

System components timeout issue Call to system functions like tx_event_flags_get, tx_semaphore_get etc. with _timeout != TX_WAITFOREVER results in thread wake before timeout in some conditions.

Description In my program I am using threadx with the netxduo on with STM32H723 processor. From the high level the program creates server socket and receives commands from the network. There is a command to make a USART transaction and return the response back via the socket. The transaction is just USART send/receive with timeout. There is "DeviceMaster" thread, it makes USART transactions. USART is configured to transmit/receive data using DMA and USART idle interrupt.

DMA TX ISR handle:

...
tx_event_flags_set(&deviceMaster.eventFlags, DM_WRITE_FLAG, TX_OR);
...

USART ISR:

    /* Check for IDLE line interrupt */
    if (LL_USART_IsEnabledIT_IDLE(UART_INSTANCE) && LL_USART_IsActiveFlag_IDLE(UART_INSTANCE)) {
        LL_USART_ClearFlag_IDLE(UART_INSTANCE);                     /* Clear IDLE line flag */
        ...
        if (tx_event_flags_set(&deviceMaster.eventFlags, DM_READ_FLAG, TX_OR) != TX_SUCCESS) {
            Error_Handler();
        }
    }

Transaction processing in the thread pseudo code, simplified:

// Checks if previous send completed
tx_event_flags_get(&deviceMaster.eventFlags, DM_WRITE_FLAG, TX_OR_CLEAR, &flags, 10);
sendDataToUsart();
// wait for remote device response with timeout=10
tx_event_flags_get(&deviceMaster.eventFlags, DM_READ_FLAG, TX_OR_CLEAR, &flags, 10);
getDataFromUsart()

Problem Sometimes call tx_event_flags_get(&deviceMaster.eventFlags, DM_READ_FLAG, TX_OR_CLEAR, &flags, 10); takes 1-2 ticks instead of 10 (ends before timeout). I have investigated the behavior a little bit using debug out and TraceX. What I have fonud is that ethernet ISR happened when the "DeviceMaster" thread suspended wakes not only ethernet-related thread but the "DeviceMaster" as well. Attaching TraceX file.

TraceX good/expected flow Event/Description

1017 - Transaction start marker; start DMA transfer
1018 - check for DM_WRITE_FLAG
1019 - DM_WRITE_FLAG set in USART TX DMA
1020 - Get time to check later (tx_get_time)
1021 - Wait for DM_READ_FLAG flag
1022 -  "DeviceMaster" suspended
...
1024 - DM_READ_FLAG set in the USART RX ISR
1025 - system resumes "DeviceMaster"
1026 - Transaction end marker

TraceX problem flow

1036 - Transaction start marker; start DMA transfer
1037 - check for DM_WRITE_FLAG
1038 - DM_WRITE_FLAG set in USART TX DMA
1039 - Get time to check later (tx_get_time)
1040 - Wait for DM_READ_FLAG flag
1041 - "DeviceMaster" suspended
1042 - "NetX ip instance" has started
...
1047 - set event
1048 - scheduler resumes "NetX ip instance"
1049 - scheduler resumes "DeviceMaster", but it still should wait for the timeout.
1050 - "DeviceMaster" gets console mutex to print timeout error
1051 - "DeviceMaster" puts console mutex back
1052 - "DeviceMaster" gets time to check how many ticks the thread was suspended.
1053 - Custom TraceX event that indicates the error and the number of ticks the thread was suspended = 2 ticks, not 10

Assumptions about root cause I have taken a quick look at the threadx sources and assume that the following logic in tx_timer_expiration_process.c file probably is not correct. this and this

if (current_timer -> tx_timer_internal_remaining_ticks > TX_TIMER_ENTRIES) 
{
...
} else {
 /* Timer did expire.  */
}                    

Not sure, but looks like in case if tx_timer_internal_remaining_ticks < 32 the system considers the timer is expired. But in my case the timer value is just 10...

Expected behavior The system scheduler wakes threads when the timeout happened not before.

Impact Timeouts work incorrectly across the system

TraceX files CustomEvents.zip Ticks to microsecond is 544 in this case.

amgross commented 1 month ago

Hi @asamuta , The code you showed in threadx timer looks OK, current_timer is timer that expired: https://github.com/eclipse-threadx/threadx/blob/485a02faec6edccef14812ddce6844af1d7d2eef/common/src/tx_timer_expiration_process.c#L203 And the check you are showing is to check if it expired due to real timeout or due to 32 ticks finished (in threaedx architecture, if some timer defined to more than TX_TIMER_ENTRIES, it will be in expired state each TX_TIMER_ENTRIES ticks, and then re-trigger till getting to 'real' expiration).

For me (as none threadx expert) it is hard to understand your problematic flow. If you can create minimal reproducible example it may be easier for you and for other users to debug it.

asamuta commented 1 month ago

Hi @amgross, The high level description of the problem:

result = tx_event_flags_get(&deviceMaster.eventFlags, DM_READ_FLAG, TX_OR_CLEAR, &flags, 10);

doesn't really wait 10 tick and wakes the thread after 2 ticks with result == 0x7 witch is

#define TX_NO_EVENTS                    ((UINT) 0x07)

And it hapens when another flag is set while my thread is waiting for its flag. (Ethernet ISR set a flag that is used by netxduo in my case).

amgross commented 1 month ago

Sorry, without full code with minimal reproducible example (probably main which create one thread that one catch the event flag and the other releases it) I can't help, maybe someone else can