MaJerle / lwesp

Lightweight and versatile AT parser library for ESP8266 and ESP32 devices.
https://majerle.eu/projects/lwesp-lightweight-esp-at-parser-library-for-embedded-systems
MIT License
447 stars 138 forks source link

Missing data when receiving large data blocks (might be a bug in ESP-AT) #123

Closed vitorb-rzero closed 2 years ago

vitorb-rzero commented 2 years ago

However, the next call to lwesp_netconn_receive() will just block instead of grabbing the remaining bytes. This is the communication trace:

TX: AT+CIPRECVLEN?
RX: +CIPRECVLEN:0,-1,-1,-1,-1
RX:
RX: OK

I don't really know if the ESP should return the remaining bytes at this point, or if LWESP must keep track of how many bytes still need to be received, but this is what I see.

Tested on rev 52b90298fba29c556ec69f4ea2506ca5b7f7c1bb using ESP AT 2.4.0.0

MaJerle commented 2 years ago

Do you have enough RAM to process all received data bytes?

Lwesp_netconn_receive will actually not try to read any data from device, it will just try to get a next packet from the message queue. Packets are read automatically by the lwesp core when there is an info with +IPD packet.

What I find strange is that first +CIPRECVLEN returned length of 2710, but next one returned just 0. So library wont read anything. That is to be checked - can you provide full log of commands?

MaJerle commented 2 years ago

FYI, to answer your second question. LwESP stores internally a “expected” available bytes, but prior any read, it does actual check to the device. Device returned 0, so it was expected there is no bytes.

vitorb-rzero commented 2 years ago

Thanks for looking into it. I do believe RAM is not a problem.

Logs:

0000030|D|lwesp   |[MSG VAR] Free memory: 0x2002aec4
0000030|D|lwesp   |[LWESP MEM] Free size: 76, address: 0x2002aec4
0000030|D|lwesp   |[LWESP MEM] Free size: 2048, address: 0x2002af6c
0000030|D|lwesp   |[LWESP MEM] Allocation OK: 76 bytes, addr: 0x2002aec4
0000030|D|lwesp   |[MSG VAR] Allocated 76 bytes at 0x2002aec4
0000030|D|lwesp   |[LWESP MEM] Allocation OK: 2076 bytes, addr: 0x2002af18
0000030|D|lwesp   |[LWESP PBUF] Allocated 2048 bytes on 0x2002af18
0000030|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002b73c
0000030|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4
0000030|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002aeac
0000030|D|lwesp   |[LWESP IPD] New length to read: 2047 bytes
0000030|D|lwesp   |[LWESP IPD] Bytes read: 2047
0000030|D|lwesp   |[LWESP NETCONN] Received pbuf contains 2048 bytes. Handle written to receive mbox
0000030|D|lwesp   |[LWESP PBUF] Deallocating 0x2002af18 with len/tot_len: 2048/2048
0000030|D|lwesp   |[LWESP MEM] Free size: 2076, address: 0x2002af18
0000030|D|lwesp   |[MSG VAR] Free memory: 0x2002aec4
0000030|D|lwesp   |[LWESP MEM] Free size: 76, address: 0x2002aec4
0000031|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aeac
0000031|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4
0000031|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002b73c
0000031|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aec4
0000031|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4
0000031|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002aeac
0000032|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aeac
0000032|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4
0000032|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002aec4
0000032|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aec4
0000032|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4
0000032|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002aeac
0000033|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aeac
0000033|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4
0000033|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002aec4
0000033|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aec4
0000033|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4
0000033|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002aeac

Let me know if this log configuration is adequate:

#define LWESP_CFG_DBG LWESP_DBG_ON
#define LWESP_CFG_DBG_TYPES_ON LWESP_DBG_TYPE_ALL

#define LWESP_CFG_DBG_NETCONN LWESP_DBG_ON
#define LWESP_CFG_DBG_MEM LWESP_DBG_ON
#define LWESP_CFG_DBG_INPUT LWESP_DBG_ON
#define LWESP_CFG_DBG_THREAD LWESP_DBG_ON
#define LWESP_CFG_DBG_CONN LWESP_DBG_ON
#define LWESP_CFG_DBG_IPD LWESP_DBG_ON
#define LWESP_CFG_DBG_PBUF LWESP_DBG_ON
#define LWESP_CFG_DBG_ASSERT LWESP_DBG_ON
#define LWESP_CFG_DBG_VAR LWESP_DBG_ON
MaJerle commented 2 years ago

Id like to see full AT commands TX/RX log from the point of start of connection, down to problem you see.

You can remove TX/RX actual data (connection TCP data), if there is a confidentiality problem.

vitorb-rzero commented 2 years ago

Alright, here are the logs:

0000026|D|lwesp   |[LWESP MEM] Allocation OK: 76 bytes, addr: 0x2002aec4

0000026|D|lwesp   |[MSG VAR] Allocated 76 bytes at 0x2002aec4

TX: AT+CIPSTATUS
RX: STATUS:4
RX: 
RX: OK
TX: AT+CIPSTARTEX="SSL","xxx.yyyy.amazonaws.com",8443,0
0000026|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002aeac

RX: +LINK_CONN:0,0,"SSL",0,"x.x.x.x",8443,58010
RX: 
RX: OK
0000030|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aeac

0000030|D|lwesp   |[LWESP MEM] Allocation OK: 76 bytes, addr: 0x2002af18

0000030|D|lwesp   |[MSG VAR] Allocated 76 bytes at 0x2002af18

TX: AT+CIPSTATUS
RX: STATUS:3
RX: +CIPSTATUS:0,"SSL","x.x.x.x",8443,58010,0
RX: 
RX: OK
TX: AT+CIPRECVLEN?
0000030|D|lwesp   |[MSG VAR] Free memory: 0x2002aec4

0000030|D|lwesp   |[LWESP MEM] Free size: 76, address: 0x2002aec4

0000030|D|lwesp   |[LWESP MEM] Allocation OK: 2048 bytes, addr: 0x2002af6c

0000030|D|lwesp   |[LWESP MEM] Allocation OK: 76 bytes, addr: 0x2002aec4

0000030|D|lwesp   |[MSG VAR] Allocated 76 bytes at 0x2002aec4

RX: +CIPRECVLEN:0,-1,-1,-1,-1
RX: 
RX: OK
0000030|D|lwesp   |[MSG VAR] Free memory: 0x2002af18

0000030|D|lwesp   |[LWESP MEM] Free size: 76, address: 0x2002af18

TX: AT+CIPSEND=0,1214
RX: 
RX: OK
RX: 
TX: <HTTP POST REQUEST>
TX: AT+CIPRECVLEN?
TX: 
RX: >
RX: Recv 1214 bytes
RX: 
RX: SEND OK
0000030|D|lwesp   |[MSG VAR] Free memory: 0x2002aec4

0000030|D|lwesp   |[LWESP MEM] Free size: 76, address: 0x2002aec4

0000030|D|lwesp   |[LWESP MEM] Free size: 2048, address: 0x2002af6c

RX: 
RX: +IPD,0,2710
0000030|D|lwesp   |[LWESP MEM] Allocation OK: 76 bytes, addr: 0x2002aec4

0000030|D|lwesp   |[MSG VAR] Allocated 76 bytes at 0x2002aec4

RX: +CIPRECVLEN:2710,-1,-1,-1,-1
RX: 
RX: OK
0000030|D|lwesp   |[LWESP MEM] Allocation OK: 2076 bytes, addr: 0x2002af18

0000030|D|lwesp   |[LWESP PBUF] Allocated 2048 bytes on 0x2002af18

TX: AT+CIPRECVDATA=0,2048
0000030|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002b73c

0000030|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4

0000030|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002aeac

RX: +CIPRECVDATA:2048,"x.x.x.x",8443,HTTP/1.1 200 OK
RX: <REMAINING DATA>
RX: OK
0000030|D|lwesp   |[LWESP IPD] New length to read: 2047 bytes

0000030|D|lwesp   |[LWESP IPD] Bytes read: 2047

0000030|D|lwesp   |[LWESP NETCONN] Received pbuf contains 2048 bytes. Handle written to receive mbox

TX: AT+CIPRECVLEN?
0000030|W|http    |Got buffer
0000030|W|http    |Buffer size: 2048, 2048
0000030|D|lwesp   |[LWESP PBUF] Deallocating 0x2002af18 with len/tot_len: 2048/2048

0000030|D|lwesp   |[LWESP MEM] Free size: 2076, address: 0x2002af18

RX: +CIPRECVLEN:0,-1,-1,-1,-1
RX: 
RX: OK
0000030|D|lwesp   |[MSG VAR] Free memory: 0x2002aec4

0000030|D|lwesp   |[LWESP MEM] Free size: 76, address: 0x2002aec4

0000031|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aeac

0000031|D|lwesp   |[LWESP CONN] Poll event: 0x2002ced4

0000031|D|lwesp   |[LWESP MEM] Free size: 16, address: 0x2002b73c

0000031|D|lwesp   |[LWESP MEM] Callocation OK: 16 bytes, addr: 0x2002aec4

Note that due to threading, the sequencing between the 3 channels (RX, TX, and unprefixed one) might be a bit off

MaJerle commented 2 years ago

I see a problem because esp replies no data on second cipdatalen call, hence no read and netconn receive stall.

vitorb-rzero commented 2 years ago

Yeah, that's the point I'm unsure about the problem being ESP AT, or the LWESP library. So in your opinion this is a ESP AT bug?

As a workaround, is there a way to use the passive communication mode in lwesp, so that the ESP sends data as they come, instead of the host needing to ask for it?

MaJerle commented 2 years ago

I removed option to get data directly. It was available in the past.

If rx data is all you have from device, then the issue is espreeif device, not my lib.

vitorb-rzero commented 2 years ago

I've been running more tests and diving deeper into this issue, and it seems possible that I'm hitting a problem related to https://github.com/espressif/esp-at/issues/280, so due to concurrency, AT+CIPRECVLEN return 0 because data is still being processed by the ESP stack somehow. In that case, polling AT+CIPRECVLEN should solve the problem.

Do you have a sense of a good place to hack this into lwesp for testing?

MaJerle commented 2 years ago

Try to remove this if statement for tcp_available_length and only keep check for being active.

https://github.com/MaJerle/lwesp/blob/d27d9c0cb377a5dc092e1ee79bc9f73179585f9a/lwesp/src/lwesp/lwesp_conn.c#L118

But then it will periodically check every (I think 1 seconds) polling mode. Edit: Period will be CONN_POLL timing, as set in configuration

vitorb-rzero commented 2 years ago

That made the device poll for data, but unfortunately ESP still reports no data in that case. I'm digging deeper

MaJerle commented 2 years ago

So it is not lib issue - I will close the ticket.