ws4py remarks - Githubissues

Lawouach commented 12 years ago

Hi there, ws4py's author here.

Thanks for the benchmark, though there can always be some concerns over their design, environment and execution, I find them useful and interesting nonetheless.

Just a couple of remarks for posterity:

ws4py was initially designed as a playground for implementing WebSocket in a specific way (using generators in Python). It wasn't implemented with high number of connections in mind. I thought this could be implemented gradually afterwards. I'm not surprised it didn't do that well yet.
ws4py runs much faster on PyPy, do you think it'd be possible for you to test that configuration as well/instead?
Here are a few test results comparing Tornado and Autobahn on my box along side ws4py. http://www.defuze.org/oss/ws4py/testreports/servers/0.2.1/

Thanks,

jlouis commented 12 years ago

Do you have any idea why you are seeing all those connection timeouts? I wonder why that happens - perhaps it is the TCP accept() backlog default of 128 which is causing trouble here. So when you are having trouble keeping up with the backlog just once in a while, your connection timeouts increase wildly.

Lawouach commented 12 years ago

I'll admit, I've never loaded ws4py that much so it's only guesses, specially with the fact I don't usually run the gevent implementation but rather the CherryPy/good ol' threads server. However, you are probably right, the backlog likely fills up quickly and I would definitely increase it. Skimming through gevent's code, the backlog seems to be at 50 by default on a stream server.

I would really need to profile ws4py to understand where it spends most of its time. I know that the (un)masking is actually heavy on the process all things considered, but here the data sent is so tiny it shouldn't hurt the results.

Looking at the reports I've linked above, I'd also be very interested if if the benchmark was executed with pypy. I have no doubt ws4py could do better if I could find the time to work on it more.

jlouis commented 12 years ago

For Python it can't be GC unless it is the cycle detector blocking the VM. So my guess is that the system can't keep up with the load, overflows the backlog queue and then stuff begins timing out. Increasing the queue will stop timeouts, but it will then also make latencies worse all over the place.

ericmoritz commented 12 years ago

I would really like to figure out why these servers drop connections like they do. For little else to determine what the optimal configuration for each platform is.

Lawouach commented 12 years ago

You might want to start increasing the socket backlog. In your ws4py runner, just add backlog=XYZ to the WebSocketServer(...) call.

server = WebSocketServer(('', 8000), backlog=128, websocket_class=EchoServer)

perone commented 12 years ago

I really think you should increase the net.core.somaxconn parameter of your setup, this could be the cause of the timeouts. It would be nice to check your syslog to verify if it isn't sending tcp cookies too, syncookies can cause timeouts and disconnections in benchmarks like that.

ericmoritz commented 12 years ago

@perone Excellent, I'll try that soon. It's not hard to get the timeouts on the other platforms, they occur very early in the test. I'll create a gist of the syslog for you to look at as well.

ericmoritz commented 12 years ago

I get this in the syslog on the server:

Jun 17 21:03:39 ip-10-36-118-97 kernel: [1125015.358550] TCP: Possible SYN flooding on port 8000. Sending cookies.  Check SNMP counters.

ericmoritz commented 12 years ago

@jlouis What puzzles me about the timeouts is that Erlang seems to be immune to them while the others. In fact in the most recent benchmark, Go hit 10,000 clients as well. I haven't had a chance to summarize the event data yet but the meminfo files have the connection counts for each server:

https://github.com/ericmoritz/wsdemo/tree/eleveldb-logging/results

Wouldn't an untuned TCP stack affect all the servers equally?

jlouis commented 12 years ago

You can set the backlog when you open the listen-socket, which is one thing to bear in mind. Another point is that if your Erlang code has plenty of available processes waiting in accept state, then there is no backlog introduced at all since there is an accepting process you can pair off the incoming connection with. I bet your code will spawn a new accepting process and that this process will call gen_tcp:accept(LSock) fairly quickly, thus establishing a 0 backlog scenario.

Say you start with 1000 of these processes. Then your backlog is, practically, 1000+D where D is the default. If the python system runs with a single accepting loop say, then surely your backlog is at most 1+D. In effect, Erlang can now tolerate a way higher amount of quick connections since there is a process to absorb it. Whereas you will quickly see timeouts in python because the only thing the kernel can do is to drop connection attempts under the assumptions that the python process is under heavy pressure.

This also gives a plausible explanation as to why the behaviour is different. But you should really check my hypothesis by reading code :)

ericmoritz commented 12 years ago

Keep in mind that the kernel is not triggering this timeout. It is the TCP client in my erlang client. I set the connection timeout to 2 seconds to determine if the server was unavailable. I had no way to tell if a small number of successful clients were due to an error on my part or if the server became unavailable. There is no timeout if the TCP connection was accepted.

jlouis commented 12 years ago

Ah! So it is a question of semantics then. The problem is, to the best of my knowledge, that the server can't keep up within the 2 second timeframe then. This means it answers in some value above 2000 ms but at that point the client has already registered the connection as a lost one.

If we graph the kernel density of response times, we can glean and see if that could be the case.

ericmoritz commented 12 years ago

Do you have enough data to graph the kernel density?

jlouis commented 12 years ago

More than! The problem is that I have too much :)

ericmoritz commented 12 years ago

@Lawouach I'm trying to run ws4py using pypy. How did you run it? Did you use cherrypy or gevents?

Lawouach commented 12 years ago

Yes I did. Gevent doesn't run on PyPy IIRC. I used CherryPy 3.2.2 and PyPy 1.8.

ericmoritz commented 12 years ago

Someone just submitted code to run tornado under pypy. If you're still curious I'll write an implementation using ws4py and cherrypy.

I also wonder if anyone has written a ws server or a http server using pypy's native greenlets module. Perhaps gunicorn?

Lawouach commented 12 years ago

Not that I'm aware of but that'd be interesting indeed. Regarding CherryPy and ws4py, you may simply use this code:

https://github.com/Lawouach/WebSocket-for-Python/blob/master/test/autobahn_test_servers.py#L4

That worked just fine with CP 3.2.2. and PyPy 1.8 (didn't try with more recent releases).

You may want to remove the two lines about logging (l28/29) which are not relevant to the test.

Also you may want to add following config settings:

'server.thread_pool': 128 'server.socket_queue_size': 128

To cherrypy.config.update(...)

Lawouach commented 12 years ago

I guess that'd be better if I could submit a pull-request for it but I won't have the time before tomorrow or even this week-end unfortunately :/

ericmoritz commented 12 years ago

I'll write up a simple server and submit a pull request that you can take a glance at. I wrote one yesterday based on your echo server but I think I deleted. I know it didn't take very long.

ericmoritz / wsdemo

ws4py remarks #20