IndaloTech / ramses_esp

Port evofw3 to Espressif ESP32-S3
9 stars 2 forks source link

Radio interface blocks when stressed by ramses_rf #4

Closed ghoti57 closed 8 months ago

ghoti57 commented 8 months ago

ramses_esp 0.34 onwards MQTT

RX and TX stop (under stress?)

Serial comms shows repeated

Backtrace: 0x4201A546:0x3FC9B650 0x4201A6F2:0x3FC9B670 0x40377161:0x3FC9B690 0x400559DD:0x3FCACC20 |<-CORRUPTED

E (68758) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (68758) task_wdt:  - IDLE (CPU 0)
E (68758) task_wdt: Tasks currently running:
E (68758) task_wdt: CPU 0: Radio
E (68758) task_wdt: CPU 1: IDLE
E (68758) task_wdt: Print CPU 0 (current core) backtrace
ghoti57 commented 8 months ago

ramses_esp 0.3.6

When running in idf powershell

idf.py -p COMxx monitor

Backtrace shows

E (159517) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (159517) task_wdt:  - IDLE (CPU 0)
E (159517) task_wdt: Tasks currently running:
E (159517) task_wdt: CPU 0: Radio
E (159517) task_wdt: CPU 1: IDLE
E (159517) task_wdt: Print CPU 0 (current core) backtrace

Backtrace: 0x4201B756:0x3FC9B9F0 0x4201B902:0x3FC9BA10 0x403771B5:0x3FC9BA30 0x400559DD:0x3FCADF90 |<-CORRUPTED
0x4201b756: task_wdt_timeout_handling at C:/esp-idf/esp-idf-v5.1.1/components/esp_system/task_wdt/task_wdt.c:461 (discriminator 3)

0x4201b902: task_wdt_isr at C:/esp-idf/esp-idf-v5.1.1/components/esp_system/task_wdt/task_wdt.c:585

0x403771b5: _xt_lowint1 at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:1236

0x400559dd: _xtos_set_intlevel in ROM

E (164517) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (164517) task_wdt:  - IDLE (CPU 0)
E (164517) task_wdt: Tasks currently running:
E (164517) task_wdt: CPU 0: Radio
E (164517) task_wdt: CPU 1: IDLE
E (164517) task_wdt: Print CPU 0 (current core) backtrace

Backtrace: 0x4201B756:0x3FC9B9F0 0x4201B902:0x3FC9BA10 0x403771B5:0x3FC9BA30 0x4200B32F:0x3FCAE020 0x4200A876:0x3FCAE040 0x40381051:0x3FCAE060
0x4201b756: task_wdt_timeout_handling at C:/esp-idf/esp-idf-v5.1.1/components/esp_system/task_wdt/task_wdt.c:461 (discriminator 3)

0x4201b902: task_wdt_isr at C:/esp-idf/esp-idf-v5.1.1/components/esp_system/task_wdt/task_wdt.c:585

0x403771b5: _xt_lowint1 at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:1236

0x4200b32f: frame_work at C:/Users/peter/eclipse-workspace/ramses_esp/components/frame/frame.c:615

0x4200a876: Radio_Task at C:/Users/peter/eclipse-workspace/ramses_esp/main/radio.c:38 (discriminator 1)

0x40381051: vPortTaskWrapper at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

E (169517) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (169517) task_wdt:  - IDLE (CPU 0)
E (169517) task_wdt: Tasks currently running:
E (169517) task_wdt: CPU 0: Radio
E (169517) task_wdt: CPU 1: IDLE
E (169517) task_wdt: Print CPU 0 (current core) backtrace

Backtrace: 0x4201B756:0x3FC9B9F0 0x4201B902:0x3FC9BA10 0x403771B5:0x3FC9BA30 0x40380532:0x3FCADFA0 0x4037EE94:0x3FCADFC0 0x4200B292:0x3FCAE000 0x4200B39D:0x3FCAE020 0x4200A876:0x3FCAE040 0x40381051:0x3FCAE060
0x4201b756: task_wdt_timeout_handling at C:/esp-idf/esp-idf-v5.1.1/components/esp_system/task_wdt/task_wdt.c:461 (discriminator 3)

0x4201b902: task_wdt_isr at C:/esp-idf/esp-idf-v5.1.1/components/esp_system/task_wdt/task_wdt.c:585

0x403771b5: _xt_lowint1 at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:1236

0x40380532: vPortClearInterruptMaskFromISR at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:568
 (inlined by) xTaskGetSchedulerState at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/tasks.c:5063

0x4037ee94: xQueueReceive at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/queue.c:1484

0x4200b292: tx_fifo_work at C:/Users/peter/eclipse-workspace/ramses_esp/components/frame/frame.c:429

0x4200b39d: frame_work at C:/Users/peter/eclipse-workspace/ramses_esp/components/frame/frame.c:644

0x4200a876: Radio_Task at C:/Users/peter/eclipse-workspace/ramses_esp/main/radio.c:38 (discriminator 1)

0x40381051: vPortTaskWrapper at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

In the second backtrace above we can see the watchdog is triggerred (on this occasion) while the radio task is in _tx_fifowork so we're trying to TX. On the device itself the TX LED is permanently lit

ghoti57 commented 8 months ago

switching to a prototype device to allow a logic analyser to monitor the GPIO debug signals it is apprent that something is interfering with the CC1101 fifo level interrupt handler, blocking it.

ghoti57 commented 8 months ago

This may be a useful reference: Interrupt allocation

ghoti57 commented 8 months ago

The FreeRTOS Symmetric-multiprocessing documentation contains the following note

Within ESP-IDF, Core 0 and Core 1 are sometimes referred to as PRO_CPU and APP_CPU respectively. The aliases exist in ESP-IDF as they reflect how typical ESP-IDF applications utilize the two cores. Typically, the tasks responsible for handling protocol related processing such as Wi-Fi or Bluetooth are pinned to Core 0 (thus the name PRO_CPU), where as the tasks handling the remainder of the application are pinned to Core 1, (thus the name APP_CPU).

This is interesting because the current ramses_esp configuration runs the radio task on Core 0 and the Host task on Core 1

In the section on Preemption it states However, there are some instances where a higher-priority task that becomes ready can be run on multiple cores. In this case, the scheduler only preempts one core. The scheduler always gives preference to the current core when multiple cores can be preempted. In other words, if the higher priority ready task is unpinned and has a higher priority than the current priority of both cores, the scheduler will always choose to preempt the current core.

So if we're trying to TX in the radio task (which is currently a fairly tight loop) it might be preempted and subsequently blocked for a critical period of time, defeting the ob=ject of trying to run it on a separate core to everything else.

ghoti57 commented 8 months ago

ramses_esp 0.3.6

Swapping the cores so the Radio Task is running on Core 1 and the Host Task on Core 0 doesn't prevent the watchdog being triggered and the TX LED ending permanently on but there is a change in other behaviour.

After a number of cycles of the watchdog error we see multiple RQ messages being dropped in the GATEWAY interface in very close time proximity

W (51457) GATEWAY: DROPPED <RQ --- 18:000730 01:140959 --:------ 2349 001 01>
W (51457) GATEWAY: DROPPED <RQ --- 18:000730 01:140959 --:------ 2349 001 04>
E (53547) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (53547) task_wdt:  - IDLE (CPU 1)
E (53547) task_wdt: Tasks currently running:
E (53547) task_wdt: CPU 0: IDLE
E (53547) task_wdt: CPU 1: Radio
E (53547) task_wdt: Print CPU 1 backtrace

Backtrace: 0x40379A6A:0x3FC9C010 0x403771B5:0x3FC9C030 0x4200B27D:0x3FCB0AD0 0x4200A876:0x3FCB0AF0 0x40381051:0x3FCB0B10
0x40379a6a: esp_crosscore_isr at C:/esp-idf/esp-idf-v5.1.1/components/esp_system/crosscore_int.c:96

0x403771b5: _xt_lowint1 at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/xtensa_vectors.S:1236

0x4200b27d: tx_fifo_wait at C:/Users/peter/eclipse-workspace/ramses_esp/components/frame/frame.c:354

0x4200a876: Radio_Task at C:/Users/peter/eclipse-workspace/ramses_esp/main/radio.c:38 (discriminator 1)

0x40381051: vPortTaskWrapper at C:/esp-idf/esp-idf-v5.1.1/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162

W (54317) GATEWAY: DROPPED <RQ --- 18:000730 01:140959 --:------ 2349 001 07>
W (54447) GATEWAY: DROPPED <RQ --- 18:000730 01:140959 --:------ 2349 001 06>
W (54457) GATEWAY: DROPPED <RQ --- 18:000730 01:140959 --:------ 2349 001 05>
W (54457) GATEWAY: DROPPED <RQ --- 18:000730 01:140959 --:------ 2349 001 00>
W (54457) GATEWAY: DROPPED <RQ --- 18:000730 10:039243 --:------ 3220 005 0000000000>
W (57387) GATEWAY: DROPPED <RQ --- 18:000730 10:039243 --:------ 2401 001 00>

ramses_rf is posting multiple messages to the Evohome controller asking about different zones

ghoti57 commented 8 months ago

Fixed in Version 0.3.7