espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.61k stars 7.27k forks source link

assert failed: esp_openthread_task_queue_post esp_openthread_task_queue.c (ret == sizeof(val)) on heavy openthread activity (IDFGH-12180) #13235

Closed rretanubun closed 7 months ago

rretanubun commented 8 months ago

Answers checklist.

General issue report

Setup

esp-idf: release 4.4.6 openthread component: https://github.com/espressif/esp-idf/commit/68ce4f1404632fa775a97f555dc3daab98bd67d3 - back when everything was open sourced.

Hardware

ESP32 as Host Processor, EFR32 as Radio CoProcessor, running in Thread Border Router Mode

Trigger for issue

Under heavy thread activity for example

  1. A TBR with 25+ SRP clients where 24+ are sleepy devices
  2. A TBR with a lot of on-mesh joiners joining and leaving

Analysis done so far

Question

Thank you for everyone's time

chshu commented 8 months ago

@rretanubun The Thread BR implementation in IDF v4.4 is not ready for production, and out of maintenance for more than 2 years. OpenThread is also an old version there, which includes a critical security issue (link), all vendors need to upgrade the OT to later versions for production.

Thread and BR features are officially supported since IDF v5.1, could you switch your project to IDF v5.1.2 version?

You may also check the esp-thread-br SDK, it supports more production-ready features like RCP update and coexistence.

rretanubun commented 8 months ago

@chshu : Thanks for the reply and the heads up on the security issue, we have that applied already 👍🏼

Thanks for finding the earliest ESP-IDF version with official support, my main concern of migrating to a later version of ESP-IDF is libopenthread_br.a - should I expect that to work with a non-espressif RadioCoProcessor? (e.g. SiLabs EFR32) - if so, is there any guide documentation I can reference?

For example the esp-thread-br SDK you mentioned specifically targets an ESP32-H RCP.

chshu commented 8 months ago

@rretanubun As long as the RCP API version matches, the ESP Thread BR SDK (Networking features) could work with any standard OT RCP, no matter the SoC vendors.

Only some platform specific features like RCP update and coexistence don't work with non-espressif RCP, since these depend on HW features.

rretanubun commented 8 months ago

@chshu : Understood. Thanks for the clarification on SoC Vendor. In case you come across it in the the past, Is there any insight you can share on testing with 25+ thread devices, either at espressif or in other posts on github that may help? Thanks for all your time so far!

chshu commented 7 months ago

@rretanubun Regarding 25+ Thread devices, what kind of info are you looking for? I don't see any issue to host 25+ Thread devices in a Thread network under ESP Thread BR. We do have some internal testbeds testing large scale topology.

rretanubun commented 7 months ago

@chshu : Apologies for the delay in responding. I am looking for advice on how to tune the system to:

  1. Given a device mix of 1xTBR and 25+ non-TBR devices (2x TME and 24+ MTD)
  2. Have the TBR Maintain SRP client registrations & services for 25+ devices
  3. Have the TBR Maintain mDNS client entry for the same devices to publish via OT-BR ADPROXY
  4. Have the TBR Process and relay CoAP messages from other thread devices to a CoAP client on the LAN side (say once-per-second-per-device)
  5. Have the TBR do this while sustaining GET request on http every 15-20 ms over ethernet (or wifi)

Given the limited number of internal and external memory available on esp32

Out of curiosity, how large is your internal testbed for testing large scale topology and what devices are used? I am asking because if some of them are publicly available devkits we may consider adding them to our testbed for device diversification.

chshu commented 7 months ago

@rretanubun

Have the TBR Maintain SRP client registrations & services for 25+ devices Have the TBR Maintain mDNS client entry for the same devices to publish via OT-BR ADPROXY

It only concerns the RAM capacity, for example, each SRP entry takes less than 64 btyes, 25+ devices take 1.6 KB RAM, no issue as long as there is enough memory. An ESP32 module with PSRAM is recommended for TBR. FYI, the ESP32-S3 with 2 MB is used in ESP Thread BR board.

Have the TBR Process and relay CoAP messages from other thread devices to a CoAP client on the LAN side (say once-per-second-per-device)

The bandwidth will be the limitation here, the 802.15.4 MAC layer bandwidth is 250 kbps, the application layer thoughput is around 8 KB/s, so the avarage throughput for each device will be 320 B/s, means the CoAP message should be less than 320 bytes if once-per-second-per-devic. It's just a number in theory, taking the RF interference and backoff, the actual throughput is even less.

Have the TBR do this while sustaining GET request on http every 15-20 ms over ethernet (or wifi)

It depends on the traffic in Ethernet/Wi-Fi, it should be ok for most cases.

There are several public available devkits from this link: https://www.espressif.com/en/products/devkits

If you have any particular questions regarding to our Thread Solution, feel free to get in touch with us via https://www.espressif.com/en/contact-us/sales-questions.

rretanubun commented 7 months ago

Thanks for the guidance @chshu! I have shared this with the team and will be in touch if we have further questions.

chshu commented 7 months ago

@rretanubun Sure, closing the current issue.