ARMmbed / sockets

mbed sockets library abstraction layer
Other
6 stars 18 forks source link

OnSent dependency on lwip timer? #38

Closed lws-team closed 8 years ago

lws-team commented 8 years ago

On an otherwise idle system, OnSent() does not seem to appear at a reasonable rate, it seem to come at ~250ms intervals.

If the system is doing something else, like receiving, the OnSent()s seem to appear at a rate related to how quickly the packets were actually sent, ie, much faster.

Is OnSent() somehow processed by the tcp timer or some other timer? Minar gets sleepy if nothing happening? It's a known issue or you need some way to reproduce it?`

bremoran commented 8 years ago

I'll need to look into the LwIP stack some more to determine the cause of this. 250ms is the rate that LwIP recommends for the periodic timer.

OnSent is only triggered when an acknowledge is received, so it should be triggered when the ACK packet comes in.

ciarmcom commented 8 years ago

ARM Internal Ref: IOTSFW-1303

lws-team commented 8 years ago

Here's a tcpdump of the k64f in a loop sending a websocket packet containing an incrementing number every time it gets an onSent(). So it should spam these packets at a good rate regulated by the ack, you can see the peer acks it one time and within 30us of receiving it.

Basically it seems to stall for almost exactly 500ms (not my guesstimate of 250ms, sorry), when the ack receipt should have triggered the next send as soon as the ack was seen.

09:21:44.184421 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 9368:9373, ack 3438286858, win 2385, length 5: HTTP
09:21:44.184436 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 5, win 30016, length 0
09:21:44.684412 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 5:10, ack 1, win 2385, length 5: HTTP
09:21:44.684435 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 10, win 30016, length 0
09:21:45.184413 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 10:15, ack 1, win 2385, length 5: HTTP
09:21:45.184442 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 15, win 30016, length 0
09:21:45.684411 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 15:20, ack 1, win 2385, length 5: HTTP
09:21:45.684448 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 20, win 30016, length 0
09:21:46.184376 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 20:25, ack 1, win 2385, length 5: HTTP
09:21:46.184412 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 25, win 30016, length 0
09:21:46.684382 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 25:30, ack 1, win 2385, length 5: HTTP
09:21:46.684403 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 30, win 30016, length 0
09:21:47.184356 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 30:35, ack 1, win 2385, length 5: HTTP
09:21:47.184382 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 35, win 30016, length 0
09:21:47.684326 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 35:40, ack 1, win 2385, length 5: HTTP
09:21:47.684366 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 40, win 30016, length 0
09:21:48.184358 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 40:45, ack 1, win 2385, length 5: HTTP
09:21:48.184395 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 45, win 30016, length 0
09:21:48.684320 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 45:50, ack 1, win 2385, length 5: HTTP
09:21:48.684347 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 50, win 30016, length 0
09:21:49.184308 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 50:55, ack 1, win 2385, length 5: HTTP
09:21:49.184329 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 55, win 30016, length 0
09:21:49.684115 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 55:60, ack 1, win 2385, length 5: HTTP
09:21:49.684140 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 60, win 30016, length 0
09:21:50.184305 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 60:65, ack 1, win 2385, length 5: HTTP
09:21:50.184338 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 65, win 30016, length 0
09:21:50.684267 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 65:70, ack 1, win 2385, length 5: HTTP
09:21:50.684306 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 70, win 30016, length 0
09:21:51.184284 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 70:75, ack 1, win 2385, length 5: HTTP
09:21:51.184324 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 75, win 30016, length 0
09:21:51.684248 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 75:80, ack 1, win 2385, length 5: HTTP
09:21:51.684282 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 80, win 30016, length 0
09:21:52.184258 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 80:85, ack 1, win 2385, length 5: HTTP
09:21:52.184301 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 85, win 30016, length 0
09:21:52.684248 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 85:90, ack 1, win 2385, length 5: HTTP
09:21:52.684284 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 90, win 30016, length 0
09:21:53.184227 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 90:95, ack 1, win 2385, length 5: HTTP
09:21:53.184254 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 95, win 30016, length 0
09:21:53.684197 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 95:100, ack 1, win 2385, length 5: HTTP
09:21:53.684232 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 100, win 30016, length 0
09:21:54.184225 IP lwipk64f.warmcat.com.http > build.warmcat.com.39731: Flags [P.], seq 100:105, ack 1, win 2385, length 5: HTTP
09:21:54.184247 IP build.warmcat.com.39731 > lwipk64f.warmcat.com.http: Flags [.], ack 105, win 30016, length 0

The error is not related to ws because serving a PNG (using the same "send more when onSent() came" scheme) he also doles out the PNG to the browser visibly in 500ms chunks.

However as I say if he is receiving on that socket at that time, he sends at the fast rate.

Later I'll modify the simple sample app from before and confirm it can be reproduced there, in case it's something specific to libwebsockets.

lws-team commented 8 years ago

Yes if you fetch HEAD of this, you should be able to easily reproduce it

https://github.com/lws-team/mbed3-dumb-http-test

This is still very simple code, now it serves a jpg embedded in the app if you GET /

However it does so one packet every 500ms...

bremoran commented 8 years ago

This is still certainly not correct behaviour. I will look into it and update this ticket when I have more information.

lws-team commented 8 years ago

Just a little clue... if I modify the test code to have two or three packets "in flight" - still only firing one new packet per onSent() - the transfer finishes 2 or 3 x faster.

So it's like there's always one event left in the queue that gets missed until it's caught by some 500ms polling action in the stack.

bremoran commented 8 years ago

The root cause of this issue is LwIP running the Nagle algorithm on TCP connections by default. Sending larger frames mitigates this issue.

I'm in the process of adding an API to the TCP Stream that allows disabling of Nagle.

bremoran commented 8 years ago

An additional problem that caused this issue is LwIP's handling of tcp_write(). By default, LwIP doesn't send data immediately. It waits until the next timer event. When Nagle's algorithm is disabled, this is probably the wrong behaviour.

lws-team commented 8 years ago

Whoa.... yes those two patches-on-top, plus use the disable nagle API in the listen accept in my case, solves it. Throughput is revolutionized ^^

I dunno if nagle should be on by default though, if that's how it acts.

Thanks a lot for the hard work debugging it.