espressif / esp-protocols

Collection of ESP-IDF components related to networking protocols
180 stars 125 forks source link

UART_BUFFER_FULL event should be ignored in esp_modem_uart.cpp when using hw flow control. (IDFGH-11459) #431

Open fdkz opened 10 months ago

fdkz commented 10 months ago

Answers checklist.

General issue report

I'm using esp-idf v5.1.1, SIM7600G modem.

As written, the UART_BUFFER_FULL event in esp_modem_uart.cpp should be ignored or just logged in case of using uart hw flow control.

components/driver/uart/uart.c has this to say about it:

//If we fail to push data to ring buffer, we will have to stash the data, and send next time.
//Mainly for applications that uses flow control or small ring buffer.

Without this change I couldn't get OTA to reliably work (or at all if using DEBUG log level for everything) over http.

I'm still getting quite a lot of "CMUX: Restarting CMUX state machine (reason: 0,1,3)" warnings when using DEBUG log level for everything, but the library seems to recover fine from these.

david-cermak commented 10 months ago

@fdkz Are you saying that the OTA is more reliable if you ignore the event? for example like this?

--- a/components/esp_modem/src/esp_modem_uart.cpp
+++ b/components/esp_modem/src/esp_modem_uart.cpp
@@ -134,7 +134,6 @@ void UartTerminal::task()
             case UART_BUFFER_FULL:
                 ESP_LOGW(TAG, "Ring Buffer Full");
                 if (on_error) {
-                    on_error(terminal_error::BUFFER_OVERFLOW);
                 }
                 reset_events();
                 break;

Would increasing the buffer size help, too?

Without this change I couldn't get OTA to reliably work (or at all if using DEBUG log level for everything) over http.

Would placing the UART ISR to RAM help? (CONFIG_UART_ISR_IN_IRAM=y)

Are you having trouble with OTA over plain http? or using https? (the former shouldn't be an issue, the latter one might be -- working on some reliable alternative -- WIP PR https://github.com/espressif/esp-protocols/pull/363)

fdkz commented 10 months ago

Yes, OTA is more reliable for me when ignoring the UART_BUFFER_FULL event. I also removed the reset_events() call which calls the uart_flush_input() function that would mess up the data stream if using hw flow control. The received data is still in uart driver buffers and waiting to be read out. NB! I'm not 100% sure what happens without hw flow control and if UART_BUFFER_FULL should be in that case ignored or not.

            case UART_BUFFER_FULL:
                ESP_LOGW(TAG, "Ring Buffer Full");
                break;

Would increasing the buffer size help, too?

Actually, yes. dte_config.uart_config.rx_buffer_size is currently 1024. Using 4096 was more reliable, but I set it to 1024 for debugging and now 1024 so far works fine. UART ISR is in IRAM.

Also you're right - I'm using OTA over https, not plain http.

david-cermak commented 10 months ago

I'm not 100% sure what happens without hw flow control and if UART_BUFFER_FULL should be in that case ignored or not.

I'll run some test to see what happens, but I recall that ignoring this caused more issues, especially with CMUX active.

Maybe we can add a config to ignore some events per user preference, since PPP protocols would recover easily.

I often hear that OTA is flaky over PPPoS even if the UART ISR is in RAM, but I haven't been able to reproduce these issues. (testing with UART ISR in flash with my #363 to implement the recoveries and re-connects). Could you please tell me more about your application? Which ESP32 are you using, CPU frequency. Is anything else (active task) running during your OTA, your baudrate, TLS parameters (key lengths, mutual authentication?)

fdkz commented 10 months ago

I'm using ESP32S3. Modem baudrate is 115200. CPU dynamic clocking is enabled (10..160 MHz), CONFIG_FREERTOS_USE_TICKLESS_IDLE=y. The firmware uses TWAI, I2C, MQTT over TLS and the image is 1.2 MB (size helps because sometimes the fatal errors appeared near the end of the download). Our application has to work normally during OTA download, so there are other things going on (not resource-intensive) during the OTA process. (Sorry, don't know how to answer the TLS parameters question at the moment.)

Had some time and created a simple test-program by removing almost everything from our firmware and using a clean esp_modem checkout. And of course OTA update worked without any errors.. I'll start re-introducing cut parts to the firmware and hopefully find out what exactly causes problems, or at least get a reproducible case. This can take more than a few days.

david-cermak commented 10 months ago

Our application has to work normally during OTA download

Would it be enough if the application works normally only when checking for an update? Does it need to work also during the update process? Would it be acceptable to interrupt these normal operations for say 2 seconds? (just asking if the manual OTA would serve your usecase, i.e. calling ota.perform() from time to time to proceed with updates)

https://github.com/espressif/esp-protocols/blob/e28150397116bf24c4970ad42a4bc055cf9c1a07/components/esp_modem/test/target_ota/main/ota_test.cpp#L157-L161

I'll start re-introducing cut parts to the firmware and hopefully find out what exactly causes problems,

That would be very helpful, thanks!