d-a-v / esp82xx-nonos-linklayer

esp8266 link layer API for new ip stack - applied to lwIP-v2
31 stars 22 forks source link

data packets getting systematically lost #20

Open tve opened 5 years ago

tve commented 5 years ago

I've been chasing an issue where data packets sent by an esp8266 get systematically lost. My linux system sends a stream to the esp, which runs arduino 2.4.2 using LwIP2, and the sketch on the esp sends application ACK messages back, which get lost.

What I'm observing using tcpdump is that the TCP segments with data sent by LwIP get lost, presumably due to wifi collisions or something like that. The pattern is that linux sends two packets (4xMSS) to the esp, which responds after some 200ms with 2 TCP ACKs with no data, and then linux immediately sends then next two packets, etc. Eventually the linux side stalls due to lack of application-level acks. At that point, the esp retransmits the lost data packets and then the whole scenario repeats for another batch of packets.

My conjecture is that LwIP is first sending TCP ACKs to respond to the incoming stream of packets, and then tries to send out the data is has queued up (it's all on the same TCP connection). But by that time, the remote linux box is already sending another 4xMSS of data and the data packets from the esp get dropped somewhere.

Questions:

d-a-v commented 5 years ago

It sounds familiar. Can you try with the latest git version of the arduino core ? It uses the lwIP-2.1.0 branch of this repository, which has now SACK(out) algorithm enabled. Your issue looks like TCP-reno (esp with lwIP-2.0) not playing well with TCP-sack (linux). A number of tcp transmission issues have been solved thanks to this SACK addition in lwIP-2.1.0 (large or high-latency streams). I don't think the issue is 802.11 or low-level WiFi blobs related. You can also enable tcpdump from inside the esp using netdump.

tve commented 5 years ago

Thanks for the reply. I will try the latest arduino core, it will take a few days though (battling with a different problem right now).