cgiWebSocketRecv calls recvCb with frame fragments when frames span multiple TCP segment

n1ywb commented 5 years ago

When it receives a websockets frame that spans multiple TCP segments, the receive callback gets called once with each segment, with no obvious way to reassemble them.

https://github.com/chmorgan/libesphttpd/blob/master/util/cgiwebsocket.c#L174

https://github.com/chmorgan/libesphttpd/blob/master/util/cgiwebsocket.c#L254 ^ this should get called once, with the whole frame

console log

I (1420127) user_main: UartWs: connect
D (1420147) cgiwebsocket: Upgrade: websocket
I (1420147) user_main: McuWs: connect
D (1420167) cgiwebsocket: Upgrade: websocket
I (1420167) user_main: EchoWs: connect
D (1420187) cgiwebsocket: Frame payload. wasHeaderByte 1 fr.len 9 sl 9 cmd 0x82
I (1420187) user_main: McuWs: len=9, more=0, bin=2
I (1420187) user_main: set time: 1543542461
I (1420197) RTC: >> writeValue: 1543542461
I (1420197) RTC:  - Fri Nov 30 01:47:41 2018

D (1437817) cgiwebsocket: Frame payload. wasHeaderByte 1 fr.len 1538 sl 1428 cmd 0x82
I (1437817) user_main: McuWs: len=1428, more=0, bin=2
I (1437817) user_main: invalid schedule len: 1428; expected 1538
D (1437827) cgiwebsocket: Frame payload. wasHeaderByte 0 fr.len 110 sl 110 cmd 0x82
I (1437827) user_main: McuWs: len=110, more=0, bin=2
I (1437837) user_main: invalid set time len: 110

browser log

wireshark decode

tidklaas commented 5 years ago

Isn't that what the userData element in the struct Websock is for?

dmcnaugh commented 5 years ago

Isn't that what the userData element in the struct Websock is for?

No, for a range of reasons:

userData is provided but not used by ay of the libesphttpd code, its for the user to attach web socket specific data to
lwip (like other ip stacks) does not perform TCP segment reassembly
this is not websocket fragmentation using the FIN flag, this is caused by TCP segmentation and only the next layer in the stack i.e. the websocket protocol layer has visibility of all the information that allows the TCP segments to be reassembled
the user implementation of a ws->recvCb is an application level function that does not need to know anything about the behaviour of the TCP stack or the websocket protocol layer and how the incoming data is segmented between packets or how it should be reassembled

The current implementation of cgiWebSocketRecv identifies that the websocket payload has been segmented and flags it in the ws->priv struct https://github.com/chmorgan/libesphttpd/blob/master/util/cgiwebsocket.c#L272 but makes no attempt to reassemble the websocket payload when it receives the next part of the payload in the next TCP segment.

Reassembly can be non-trivial as a total websocket payload can be 2^63 bytes long, but this problem is not only exposed by big websocket payloads. It also manifests with small payloads when the client decides to transmit multiple WebSocket.send's in a single TCP frame (which cannot by controlled by the frontend developer in modern browsers) and the combined size exceeds a single TCP frame (typically <1500 bytes). In this case reassembly should be simpler but still should be implemented in the websocket protocol layer.

The first case of large websocket payloads spanning many TCP frames is made complex because of memory constraints with a micro-controller, and a limit could be set using a configuration parameter for libesphttp to make it manageable. The second case of a websocket payload that has be segmented across two (2) TCP frames simply because of aggregation in TCP frames for transmission needs to be solved for even small payloads to work reliably in the libesphttpd websockets implementation.

I am going to try and fix the second case in the sort term while ignoring the first case as that is not currently the problem I am having. If I come up with something useful I will setup a PR with my working code.

tidklaas commented 5 years ago

Disclaimer: I have not used Websockets for anything serious, just played around around a bit with some examples.

The way I understand the mechanisms here, the websocket will provide a reliable, ordered stream of application level data. Like with Berkeley sockets, there is no guarantee that blobs of data will be received in a single operation, so any application has to be able to either process incoming fragmented data on the fly, or buffer the data across multiple reads until a complete payload unit has been received. Such a buffering mechanism can easily be implemented using the private userData pointer.

Assembly by the libesphttpd would be a very bad idea, since, as you pointed out, payloads can get quite big and the lib has absolutely no idea of what the contents should look like, so it would have to buffer all of it. The application, on the other hand, knows exactly how to handle the data, which parts can be processed on the fly or simply be discarded, and, crucially, detect corrupted or malicious payloads and then act immediately.

I do see a minor problem in the way the MORE flag is handled. If a final frame gets fragmented, the MORE flag should be kept set until the last fragment's data gets handed to the callback function. Line 253 should be something like this

if ((ws->priv->fr.flags & FLAG_FIN) == 0 || (ws->priv->fr.len > sl)){
    flags |= WEBSOCK_FLAG_MORE;
}

instead of this if ((ws->priv->fr.flags&FLAG_FIN)==0) flags|=WEBSOCK_FLAG_MORE;

Again, technically this is not strictly necessary, as the application should (must!) verify the received data and can therefore deduce that fragmentation has happened.

Tido

chmorgan / libesphttpd

cgiWebSocketRecv calls recvCb with frame fragments when frames span multiple TCP segment #51