crossbario / autobahn-testsuite

Autobahn WebSocket protocol testsuite
https://crossbar.io/autobahn/
Apache License 2.0
1.01k stars 84 forks source link

UTF-8 Fail Fast #1

Closed oberstet closed 7 years ago

oberstet commented 12 years ago

Actually, we should distinguish the following behaviors:

fail-per-message fail-per-frame fail-per-buffered fail-per-first-octet (aka fail-fast)

""" probably this is something which would be good reason to make that case section "informational" .. because the probably most sensible beh. (fail-per-buffered) is not canonical/well-defined anyway. mmh

it could be tested: start with N*valid char + invalid .. gradually increase N from 0 to 128k .. that should cover up most fail-per-buffered

actually: 6.4.3 should be extended so that it does that increasing N test. it would then report N*, the first N at which the peer detected invalid utf8.

for fail-fast: N* would be 0 for fail-per-buffer: N* would be the impl. buffer size

and it would fail if N* > N .. that is the impl. only does per-frame/per-msg val. """

joakime commented 12 years ago

Note that the Case 6.4.3 and 6.4.2 (from Autobahn WebSockets 0.5.1/0.5.5 Fuzzing Client) is based on chop sizes of a single frame/fragment. Due to various networking buffering in place between the Fuzzing client and the server these chops might NOT show up in the particular way you intend to the destination server. In fact, for Java, those chops appear as a single read buffer.

I would hazard to say it is impossible for Java implementations to pass those tests.

oberstet commented 12 years ago

There is a 1s delay between the chops while sending. It is very unlikely that the OS TCP/IP stack of the receiving side buffers data longer than 1s before it forwards the data to the application. Switch off Nagle. I would take the challenge to write a Java program that is able to pass above tests. Actually, I'll probably do that for AutobahnAndroid. It's low prio for us .. and currently I am totally overloaded however.

joakime commented 12 years ago

Verified on Wireshark that the behavior on "the wire" is as documented by the test case. While a standard blocking Socket implementation can see this nuance, a non-blocking implementation of same server (NIO or NIO.2 or AIO or even SSL in the mix) cannot see this behavior. Disabling nagle has had no effect. Bumping up the delay to 10s does help, but not in 100% of test runs.

oberstet commented 12 years ago

To be honest, I don't believe that using NIO is the origin of the behavior you see from your code. That would mean NIO buffers up small amounts of received stuff .. potentially for long time. Think about it: ping/pong tests I assume work fine .. and those also do send small amounts. Where is the difference? It seems very unlikely that NIO makes it impossible to implement a single byte sized message protocol i.e. Anyway: AutobahnAndroid also uses NIO. But currently does not implement streaming processing. As soon as we implement it there I'll be able to check myself.

joakime commented 12 years ago

Its been a curious issue indeed. We've improved our implementation on the Jetty side 9.0.0 (in active early development) is now far more compliant. Results -> http://joakim.erdfelt.com/jetty/autobahn/server-9.0.0-SNAPSHOT/ We are now in line spec-wise with WebSocket++ implementation. (it also does not pass those 2 specific test cases)

It is curious that the autobahn server is the only server that can pass those 2 test cases. No other server implementation has yet to pass those 2 tests.

We are investigating what the behavior could possibly be, and also expanding our tests of those 2 scenarios for the following ... Different OSs? (been predominately testing on linux, will try OSX and Windows too) Different JVMs? (been predominately testing on JDK 7u5 / 64-bit, will try 32-bit as well) Different Network Stacks? (been testing on ubuntu/linux with IPv4 + IPv6 both enabled, will try pure IPv4 and pure IPv6 as well) Different Payload Sizes? (we will modify the testcase locally to have larger initial payload, and still chop at the GOOD/BAD/GOOD chops for UTF8 testing)

Note: that the most basic level test, GOOD/BAD/GOOD utf8 message splits and early invalid UTF8 detection does work and we have a specific test case for that scenario, so we are rather confident that this is purely a network read issue at this point.

oberstet commented 12 years ago

Thats not correct: WebSocket++ passes the complete suite fully green (with appropriate configuration of the server). Including the chopped up UTF-8 ones. I have contacted Peter .. he will comment soon. I am not cheating! ;)

oberstet commented 12 years ago

Looked at your test reports: great! I'd be interested in comparing that latest Jetty against WebMQ (http://www.tavendo.de/webmq/getstarted/setup/virtualbox). Do you have a virtual appliance with Jetty in it, where you have done all OS tuning/optimization to support high-performance/massive connection numbers? Our WebMQ appliance is highly tuned .. and for fair comparison I think the vendor should do that. We could then run Jetty/WebMQ in VMs of identical HW (cores/RAM) and compare.

zaphoyd commented 12 years ago

Just to confirm, the WebSocket++ 0.2.0dev can pass 100% of the autobahn suite in strict mode, including the chopped UTF8 ones. There is a setting in common.hpp (there will be an API for this at some point) that determines the processing buffer. Streaming UTF8 is validated once every time this buffer fills. Setting that buffer to 1 will catch the very first invalid byte immediately and thus passing the chopped fast fail AB tests. In real world scenarios I would probably set that buffer to somewhere in the ~512 byte range for best performance. I think catching utf8 errors to nearest 512 bytes or so should be sufficient to stop large invalid messages from chewing up resources without the overhead of spinning up the unmasker/utf8validator for every byte.

I think a good informational test for AB would be a progressive series that probes for the threshold at which an implementation can fast fail utf8 validation rather than having a specific (especially a single byte) hard coded threshold unless there is a threshold that is obviously best and all implementations should use.

joakime commented 12 years ago

Good to know its a sane test. :-)

BTW, I was looking here ... http://zaphoyd.com/websocketpp/autobahn-report/ <-- Websocket++/0.2.0 test report, linked from the Websocket++ homepage, shows those 2 tests not green (non-strict).

Tested with ByteBuffer allocation/size of 1 for our internal NIO read buffers, it passes, but at a horrible cost of performance. But that's not a great solution, and it just masks the problem with the test, namely that by the time NIO sees the channel has bytes available to read/fill from all 3 chops have been merged into 1 read operation for NIO.

Jetty 9.0.0 is not available in a VirtualBox VM yet (good idea tho)

zaphoyd commented 12 years ago

that report hasn't been updated in awhile. I should fix that.

joakime commented 12 years ago

Added Issue #9 to request additional Informational content/flag on these 2 test cases. Thanks again for the sweet suite.

vinniefalco commented 8 years ago

I have an issue with this test as well. It depends on the test sending a close message in response to the server, in the middle of the payload. I've posted on the autobahn Google group. I will cross post it here:


I'm working on a C++ implementation of websockets that is modeled after the boost::asio interfaces. I'm trying to get the Autobahn tests to pass (which is a wonderful software package by the way). I have concerns about test 6.4.3 "UTF8 Handling: Fail-fast on invalid UTF-8"

This is what the Autobahn fuzzingclient does:

  1. Send frame header, payload len = 21
  2. Send 11 bytes of payload
  3. Send 4 bytes of payload
  4. Receive close frame
  5. Send close frame
  6. Send remaining 6 bytes of payload

Other implementations pass this test, presumably by expecting the close from 5 right away instead of eating up the 6 bytes remaining in the frame payload. For example, websocketpp passes the test. You can see the last 6 payload bytes getting sent after the close (Line 11 in Wire Log): http://autobahn.websocketpp.org/servers/websocket___0_5_1_permessagedeflate_case_6_4_3.html

On my server I see these steps take place:

  1. Receive frame header, payload len = 21
  2. Receive 11 bytes of payload
  3. Receive 4 bytes of payload, utf8-check failure
  4. Send close frame
  5. Receive close frame

Here's where I think there's an issue. Passing the test means that the other end has to send a close message in the middle of a frame payload? And the server has to assume that the last 6 bytes of the expected frame payload are not going to be received? That can't be right. TCP/IP offers no guarantee in what group bytes arrive at their destination other than they are ordered correctly. For all the server knows, in step 5 when it is expecting a close frame those last 6 payload bytes could arrive late. And now the frame stream is desynced, with no way to do a clean shutdown.

It seems one of three statements must be true:

  1. The Autobahn test is wrong. It should never start a new frame while there are still payload bytes remaining in the current frame
  2. There's a flaw in the WebSocket specification where proper close handling could leave the frame stream desynced depending on network conditions.
  3. I simply don't understand how this thing works and there's something obvious I'm missing.
jchampio commented 8 years ago

You can see the last 6 payload bytes getting sent after the close (Line 11 in Wire Log)

I think Line 11 shows the octets of the transmitted Close frame, doesn't it?

Passing the test means that the other end has to send a close message in the middle of a frame payload? And the server has to assume that the last 6 bytes of the expected frame payload are not going to be received?

I don't think that's what has to happen to pass the test -- mod_websocket passes the test without making that assumption, so I think the only success criteria is that the server sends a fail-fast Close frame.

I do agree with you that the fuzzingclient seems to be destroying its own frame boundary, at least if I'm reading the log right. Whether or not that's incredibly important for the purposes of this automated test, I don't know, since part of the point of the fuzzingclient is to break the protocol and see what the server does. Is your implementation failing the test?

(I'm not affiliated with Autobahn, by the way -- just a fellow client.)

vinniefalco commented 8 years ago

@jchampio Thanks for responding. My original implementation upon sending a close during fail-fast utf8 was to read and discard the remaining bytes in the current frame payload, then go into a loop reading and discarding frames until getting a close frame. This implementation failed the autobahn tests. On a lark I tried going straight to expecting the close frame and it passed the test. Then I did a little digging and figure out that autobahn is breaking a frame boundary.

I agree that the purpose of the test is to break the protocol, but I don't think its correct behavior to say that when a client breaks the protocol, the server is expected to act a certain way. It should be possible for an implementation to assume that frames sent by the remote will not be sent in the middle of the payload of the current frame, without failing the autobahn tests. It seems that passing the test relies on a certain behavior of TCP/IP which is not guaranteed.

In case anyone is wondering, this is what I did in my code: https://github.com/vinniefalco/rippled/blob/server/src/beast/beast/wsproto/impl/read_op.ipp#L206 I wanted to go into state "6" which finishes out the payload and discards frames but this does not pass the tests. Going straight into "7", which expects a new frame, passes the test.

jchampio commented 8 years ago

I don't think its correct behavior to say that when a client breaks the protocol, the server is expected to act a certain way.

I disagree; in most cases RFC 6455 lays out clear requirements for endpoint behavior when a peer violates the protocol. See below.

My original implementation upon sending a close during fail-fast utf8 was to read and discard the remaining bytes in the current frame payload, then go into a loop reading and discarding frames until getting a close frame. This implementation failed the autobahn tests.

That's because this behavior violates the spec. From 8.1:

When an endpoint is to interpret a byte stream as UTF-8 but finds
that the byte stream is not, in fact, a valid UTF-8 stream, that
endpoint MUST _Fail the WebSocket Connection_.

And from the definition of Fail the WebSocket Connection in 7.1.7:

[...] An endpoint MUST NOT continue to attempt to
process data (including a responding Close frame) from the remote
endpoint after being instructed to _Fail the WebSocket Connection_.

So IMO it's fairly clear: a server should send an explanatory 1007 close code, then close the connection without continuing to read anything.

vinniefalco commented 8 years ago

@jchampio That's exactly the clarity that I was looking for, thanks!

Update: 6.* all pass for me now that I'm following 7.1.7

oberstet commented 7 years ago

rgd my original issue / scope of this issue: wont fix. i dont think it's worth, and more so, so many implementations out there are tested and rely on the current behavior and test scope of this testsuite that I'd expect a lot of discussions;)