espressif / esp-protocols

Collection of ESP-IDF components related to networking protocols
181 stars 126 forks source link

Freezing of the ESP, triggering the WDT! (ESP-MODEM) (IDFGH-12332) #528

Open renansoaress opened 6 months ago

renansoaress commented 6 months ago

Answers checklist.

General issue report

I'm experiencing a problem with the ESP-MODEM. In some cases, where the connection signal is very poor, it causes the ESP32S3 to freeze, which triggers the system's watchdog timer and restarts the ESP. I've created a sample project where I can simulate the problem. As soon as it receives the IP from the connection, I cover the antenna with my hand, which weakens the signal and causes the issue to occur.

We have a project at my company that uses PPP communication with a Quectel EG915 module. This issue has been a major headache because if the ESP restarts, we need to reconfigure the PPP, causing the Quectel module to reset. The problem with this is that we need to reconfigure the GPS on the module, which takes some time and results in the loss of some points.

I'm using the ESP-MODEM in CMUX mode to be able to send commands up to requesting the GPS, while maintaining the PPP connection for sending information via both socket and MQTT.

The problem really only occurs when the signal is very weak and remains frozen for hours, even when the signal improves.

The ESP-MODEM in the example is on version 1.0.1 (but I tested it on the latest version 1.1.0, and the problem persists)

The ESP-IDF is on version 5.1.2 (but I tested it on the latest version 5.2.1, and the problem persists)

Several tests were done increasing the priority of the lwip, esp_modem, and various other tasks, as well as increasing the buffers to see if it would solve the problem, but nothing worked.

Example project link: https://github.com/renansoaress/test_ppp_esp32s3_quectel

Link to the problem log in this sample project: https://github.com/renansoaress/test_ppp_esp32s3_quectel/blob/master/LOG_ERRO.txt

david-cermak commented 6 months ago

Do you always get the same error? Not only the TG0WDT_SYS_RST, but does it always point to this spinlock_release()?

Do you use UART or USB ? Could you please check if you can reproduce the issue with CONFIG_FREERTOS_UNICORE=y ?

Also, I need to ask about power supply, are you sure that the ESP32S3 is 100% properly powered all the time? (or maybe if the modem power net is the same as the CPU, e.g. if the device draws more current when the signal's weak, we could get a voltage drop on CPU as well?)

renansoaress commented 6 months ago

Do you always get the same error? Not only the TG0WDT_SYS_RST, but does it always point to this spinlock_release()?

Do you use UART or USB ? Could you please check if you can reproduce the issue with CONFIG_FREERTOS_UNICORE=y ?

Also, I need to ask about power supply, are you sure that the ESP32S3 is 100% properly powered all the time? (or maybe if the modem power net is the same as the CPU, e.g. if the device draws more current when the signal's weak, we could get a voltage drop on CPU as well?)

Sorry for the delay in responding; I was testing everything you mentioned...

Yes, it's always the same WDT error;

I'm using UART, and I tested it with CONFIG_FREERTOS_UNICORE enabled, but the problem still occurs.

I checked the board's power supply, and everything is in order. I also tested with an oscilloscope to see if there was any voltage drop, but I didn't observe any issues.

Any ideas would be welcome.

david-cermak commented 6 months ago

Just to rule out some dangling pointer issues, I'd recommend removing these two lines:

https://github.com/renansoaress/test_ppp_esp32s3_quectel/blob/4a73b03f86d310a0ebf8e267eb0fb312383801cc/main/main.c#L259-L260

(as you could have just destroyed the handle, which is still used in the check-signal thread)