New Framing Should use a length based approach

LearnBoost / socket.io-spec

Specification for the Socket.IO Protocol (0.9)

http://socket.io

225 stars 22 forks source link

New Framing Should use a length based approach #1

Open aikar opened 13 years ago

aikar commented 13 years ago

Currently Socket.IO spec has listed as FRAMEMARKER,DATA,FRAMEMARKER

Ideally, a message frame should include the length of the data payload so that the parser will know exactly how long a message is and does not have to 'scan' for the ending marker.

so: MARKER,LENGTHOFDATA,DATA,MARKER

so: 0xFF0xFD3,foo0xFF0xFD

Can identify that fhe data set will be a length of 3 before the end marker.

So if you receive 0xFF0xFD3,f as a buffer you know you need 2 more characters before you should even expect to see the end marker.

This lets the parser simply check the length and not look for the end marker until it knows it has all of the data and then it knows exactly where the endmarker is.

This is of course better for performance, and very trivial to implement.

tj commented 13 years ago

rauchg commented 13 years ago

This is how we are right now, I just wanted to keep it more consistent with the websocket protocol.

aikar commented 13 years ago

well I meant anywhere your doing custom framing (like HTTP connections instead of WebSocket) implemented.

On the websocket side, its of course forced and yeah we dont need double framing (someone should recommend updating the spec to be length based since weve already broke capabilities to do raw connections anyways :()

but as I read the frame identifiers in the spec, anywhere this framing is implemented outside of websocket should use length.

madari commented 13 years ago

@Aikar: I didn't quite understand what you meant by your last comment. Just out of curiosity, you suggested MARKER LENGTH DATA MARKER. Why would you want to keep the MARKERs?

aikar commented 13 years ago

markers still help provide error testing and recovery.

Without them, you have no way to detect if the frame matches up,

With them, your not 'scanning' for the end marker but you know exactly where it SHOULD be, and doing a simple buffer[index] check to ensure it matches the marker symbol ensures the data is valid.

If you check for where a marker should be, and find it isnt, you can add error recovery to then fallback to a scan of the next start marker and continue the parse, and throw a warning/alert that data was malformed, and continue its business.

Where as w/o them, theres no way to detect a corrupted/malformed message and itll destroy the entire parser.

madari commented 13 years ago

@Aikar: That's a good point from the streams point of view. Although it doesn't guarantee that the message isn't malformed/corrupted (that's what TCP does) but it does guarantee that the stream of messages is still flowing as it should be. And the overhead is negligible too. So I agree with your original suggestions: FRAMEMARKER LENGTH DATA FRAMEMARKER.

madari commented 13 years ago

Also the LENGTH should be in bytes not in utf-8 codepoints (as it has been previously). If you decide not to add LENGTH to the messages then the spec should state how to escape FRAMEMARKERs.

madari commented 13 years ago

Another variation would be FRAMEMARKER LENGTH DATA where the FRAMEMARKER would be a protocol identifier (e.g. 0xFFFD as proposed) and explictly indicate the beginning of a frame. That would enable recovery (by always looking for the next FRAMEMARKER) and the trailing FRAMEMARKER could be dropped.

tj commented 13 years ago

@madari +1 to that suggestion, makes sense, no reason you could not recover on the frame instead of a trailing marker, probably even easier to implement this way actually

3rd-Eden commented 13 years ago

For the sake of simplified parsing across different platforms, +1 for @madari 's suggestion.

highlight @guille