clj-commons / aleph

Asynchronous streaming communication for Clojure - web server, web client, and raw TCP/UDP
http://aleph.io
MIT License
2.54k stars 241 forks source link

Connection Problems with Keep-Alive #248

Closed xsc closed 8 years ago

xsc commented 8 years ago

Hey there,

it seems to me that the HTTP keep-alive feature in aleph servers might be broken. I created a minimal project to illustrate the problem:

https://github.com/xsc/aleph-stream-testcase

I tested against 0.4.1 and 0.4.2-alpha4, by setting up a server that returns either an InputStream or a byte array as :body, then used Apache Bench with the -k option to reuse connections during the test run, e.g.:

$ ab -n 500 -c 4 -k http://localhost:9877/bytes
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)

Test aborted after 10 failures

Total of 12 requests completed

Note that a run without -k works perfectly fine:

$ ab -v1 -n 500 -c 4 http://localhost:9877/bytes
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests

Server Software:        Aleph/0.4.1
Server Hostname:        localhost
Server Port:            9877
...

For the InputStream version there are some Exceptions (e.g. error in manifold.utils/future-with java.io.IOException: Stream closed and error in message propagation java.io.IOException: Broken pipe). I can provide complete stacktraces but they should be produced by the test project I provided.

If you need any more information, let me know! Yannick


$ lein version
Leiningen 2.6.1 on Java 1.8.0_45 Java HotSpot(TM) 64-Bit Server VM
ztellman commented 8 years ago

I'm able to reproduce your failure, but this, for instance, works fine:

 httperf --num-conns=16 --rate=16 --num-calls=10000 --port=8=9877 --uri=/stream

Also, Aleph has successfully run similar tests on the TechEmpower benchmarks without issue. I noticed that the throughput of the tests improved significantly when I added this to the project.clj:

:jvm-opts ["-server" "-Xmx1g"]

I also noticed that when I returned a shorter byte array via /bytes, ApacheBench works fine. I'm tempted at this point to blame ApacheBench for handling large responses via keep-alive improperly. Thoughts?

xsc commented 8 years ago

We experienced problems (i.e. connections being closed before the data was fully transmitted) with aleph behind proxies like CloudFront or HAProxy which prompted me to investigate the matter, especially the keep-alive behaviour between the two.

But, seeing as ab is seemingly drunk, those might just have been instances of #191, so I'll close the issue for now and see what I can gather over the course of next week.

Thanks for the quick response!