eclipse-threadx / threadx

Eclipse ThreadX is an advanced real-time operating system (RTOS) designed specifically for deeply embedded applications.
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/threadx/index.md
MIT License
2.89k stars 784 forks source link

Getting 'empty' message from tx_queue_receive #258

Open SchutzSeb opened 1 year ago

SchutzSeb commented 1 year ago

Hi,

We are currently using ThreadX (6.2.0) along with Tracealyzer (4.6.6) in order to collect and visualize traces. The software is running on an ATSAMS70 (GCC arm toolchain) and we have many threads running with different priority levels. Each thread has a dedicated queue in order to receive messages from another threads or ISRs.

We have observed that when the device is heavily loaded while running Tracealyzer tool we can get sometimes 'empty' message out from tx_queue_receive function. The investigations we have made up to now show that this happens because the thread that was suspended while waiting for a new message is being resumed while no message has been sent to its queue.

Here is the screenshot from Tracealyzer with some annotations:

image

We can see on one hand that /nonsil/ProcInterControllerComm is resumed while no new message has been sent to its queue, and on the other hand that /system/ProcHalControl is not resumed while a message has been sent to its queue (it was also suspended waiting for a new message). Moreover /system/ProcHalControl is more prioritary than /nonsil/ProcInterControllerComm.

We hope that you might have some suggestions to understand what is happening and how we could fix it properly.

Best regards.

xiuwencai commented 1 year ago

Hi @SchutzSeb, could you please check the tx_thread_state value of the mistakenly resumed thread?

SchutzSeb commented 1 year ago

Hi @xiuwencai,

Thank you very much for your reply.

_tx_thread_current_ptr has tx_thread_state to 5 when I detect the 'empty' message from tx_queue_receive.

xiuwencai commented 1 year ago

Hi @SchutzSeb, could you please share more information about your configuration: Have you defined anything in tx_user.h or defined any ThreadX configuration when compiling? Are you using the data cache? If so, is it configured to write-through or write-back mode? What's the version of the GCC?

SchutzSeb commented 1 year ago

Hi @xiuwencai,

Yes we have some definitions into our tx_user.h:

#define TX_TIMER_TICKS_PER_SECOND 1000
#define TX_MAX_PRIORITIES 32
#define TX_ENABLE_STACK_CHECKING
#define TX_THREAD_USER_EXTENSION VOID* tx_thread_local_storage;
#define TX_ENABLE_EVENT_TRACE

The data cache is enabled on the microcontroller (__DCACHE_PRESENT set to 1) but I have found no additional configuration regarding this feature, so I'm not sure for the write mode (microcontroller is ATSAMS70).

Currently we use gcc-arm-12.2.MPACBTI-Rel1-mingw-w64-i686-arm-none-eabi for compiling.

williamelamie commented 8 months ago

Was this issue ever resolved?

SchutzSeb commented 8 months ago

We have found a way to avoid the issue to occur by disabling interrupts when calling tx_queue_send. Although, as of today, we still have not found the origin of the problem.

williamelamie commented 8 months ago

In theory, ThreadX will disable interrupts briefly inside the tx_queue_receive/tx_queue_send. When you get a chance, it would be interesting to turn off the interrupt lockout you added and set a BP (or step into) the tx_queue_send/receive code to ensure the interrupts are disabled where they should be. It has the feel that interrupts are getting enabled somewhere (maybe the tracing code) when they shouldn't be.

On Mon, Feb 5, 2024 at 11:08 PM SchutzSeb @.***> wrote:

We have found a way to avoid the issue to occur by disabling interrupts when calling tx_queue_send. Although, as of today, we still have not found the origin of the problem.

— Reply to this email directly, view it on GitHub https://github.com/eclipse-threadx/threadx/issues/258#issuecomment-1928904296, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3YCNAPGDLWTMZOKV23LKGLYSHJGXAVCNFSM6AAAAAAXUDGF7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYHEYDIMRZGY . You are receiving this because you commented.Message ID: @.***>

SchutzSeb commented 7 months ago

Hello, Thanks for your reply. I have put assertion in tx_queue_send everywhere before TX_RESTORE is called to check if interrupts were enabled back while they should not. But no assertion occured. Nevertheless I think you pointed in the good direction. I have tried to disabled any manipulation of the interrupts by Tracealyzer and it seems that the issue was not happening anymore (of course I had removed my fix before). The traces recorded were chaotic but the device could run fine. Now at least I have an idea where the issue is possibly coming from. Thanks for your help ! I'll give update in case I get new information.