espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.31k stars 7.2k forks source link

BG96 and OTA: firmware download always fails (IDFGH-3528) #5480

Closed montirob closed 4 years ago

montirob commented 4 years ago

Environment

Problem Description

I am trying to get the OTA working using the PPPoS protocol and Quectel's EC21 module to connect to the internet. The connection between the ESP32 and the EC21 module is done using UART1 with hardware flow control at 115200bps. The firmware to download is hosted on hawkBit Server.

The connection is ok and the download start but never successfully ends.

Expected Behavior

Correct OTA update from an https server.

Actual Behavior

The download start correctly but always stop after some time... Looks like a behavior due to a timeout.

Steps to reproduce

1 - Connect to the internet using the PPPoS example 2 - Run the ESP32 OTA update Slightly adapted the BG96.c example file in order to be compliant with the EC21 LTE module ( Always Quectel: almost nothing to do )

Connected using UART1 with hardware flow control to the EC21 module using the umts&LTE-evb provided from Quetel.

Code to reproduce this issue

example code of PPPoS with BG96 and the OTA update example

Debug Logs

D (120249) HTTP_CLIENT: need_read=289, byte_to_read=289, rlen=289, ridx=0 D (120259) HTTP_CLIENT: http_on_body 289 D (120269) esp_https_ota: Written image length 149413 I (120269) perform_ota_update: downloaded 12.75% D (120269) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=1171664 I (120279) mbedtls: ssl_tls.c:8270 => read

I (120289) mbedtls: ssl_tls.c:8558 <= read

D (120289) HTTP_CLIENT: need_read=289, byte_to_read=289, rlen=289, ridx=0 D (120299) HTTP_CLIENT: http_on_body 289 D (120309) esp_https_ota: Written image length 149702 I (120309) perform_ota_update: downloaded 12.78% D (120309) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=1171664 I (120319) mbedtls: ssl_tls.c:8270 => read

I (120319) mbedtls: ssl_tls.c:8558 <= read

D (120329) HTTP_CLIENT: need_read=289, byte_to_read=289, rlen=36, ridx=0 D (120339) HTTP_CLIENT: http_on_body 36 D (120339) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=1171664 E (120349) TRANS_SSL: ssl_poll_read select error 104, errno = Connection reset by peer, fd = 54 D (120359) HTTP_CLIENT: need_read=253, byte_to_read=253, rlen=-1, ridx=36 D (120359) esp_https_ota: Written image length 149738 I (120369) perform_ota_update: downloaded 12.78% D (120369) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=1171664 E (120379) TRANS_SSL: ssl_poll_read select error 0, errno = Success, fd = 54 D (120389) HTTP_CLIENT: need_read=289, byte_to_read=289, rlen=-1, ridx=0 D (120399) HTTP_CLIENT: Data processed 149449 != Data specified in content length 1171664 I (120399) perform_ota_update: downloaded 12.78% D (120409) HTTP_CLIENT: is_data_remain=1, is_chunked=0, content_length=1171664 E (120419) TRANS_SSL: ssl_poll_read select error 0, errno = Success, fd = 54

Other items if possible

I have already tried to increase the timeout as it has been suggested here but without success log con tls .txt

Alvin1Zhang commented 4 years ago

Thanks for reporting.

montirob commented 4 years ago

Here attached a mbedTLS verbose level log of the fail. After the fail the ESP32 does not exit from the error condition.

OTA_fail_tls_verbose.txt

david-cermak commented 4 years ago

Hi @montirob

Could you please check if the OTA update works okay with the same server if you connect using WiFi (or Ethernet if possible)?

The "Connection reset by peer" error clearly says that the server terminated the connection. I thought I was going to suggest increasing UART buffer sizes, but it would make sense for other kinds of error (such as buffer overflow).

montirob commented 4 years ago

Hi @david-cermak yes if I connect using wifi the update works fine, but is more fast of the LTE. Initially I was experimenting HW_FIFO_OVERFLOW, thinking that ic could be the cause of the problem I added hardware flow control and fixed it. I've also checked the uart event queue, is OK.

david-cermak commented 4 years ago

@montirob Is it possible that your server defines some download timeout and terminates connection if exceeded? It should be quite easy to simulate if you can limit the WiFi throughput and run OTA over WiFi again.

There should be no difference between the PPP network interface and other standard interfaces regarding the OTA updates. I've tested OTA update with the esp-modem (BG96) interface locally and it works as expected on my end.

montirob commented 4 years ago

@david-cermak thanks for your answers. Increasing the uart speed from 115200 to 230400bps the download finished successfully. So the hypothesis of some server-side timeout starts to be convincing for me too.

Alvin1Zhang commented 4 years ago

Thanks for reporting, feel free to reopen.