christianparpart / x0

Xzero HTTP Application Server
MIT License
110 stars 16 forks source link

Performance regression? #53

Closed byzhang closed 11 years ago

byzhang commented 11 years ago

I just pull the latest code, and found x0d is (much) slower than one month ago. (Probably I have some misconfig). I start x0d as: echo 1024 > /proc/sys/net/core/somaxconn build/src/x0d -f src/performance.conf -p x0d.pid -X

And benchmark using weighttp -n 10000 -c 2 -t 1 -k "http://127.0.0.1:8080/100.html" (100.html has 100 bytes) It only yields ~50QPS. It could process 200K QPS for 100.html (using: weighttp -n 1000000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html", which cannot complete now)

I believe I do something wrong, but cannot figure it out.

ghost commented 11 years ago

+1, sometimes images don't load (the request pending)

christianparpart commented 11 years ago

Hey.

Many thanks for reporting. I will check as soon as possible.

Christian. Am 26.02.2013 19:47 schrieb "Sabri" notifications@github.com:

+1, sometimes images don't load (the request pending)

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14132029 .

christianparpart commented 11 years ago

okay, I must admit, it definitely is because of the keep-alive option ou have enabled (-k) and I need to finally make the keep-alive stable.

For the time being you can disable keep-alive with setting the option max_keepalive_idle 0; in the setup handler.

I would like to ideally first release 0.6.0 and then focus on new things, such as (and most importantly) refactor/fix the keep-alive logic.

I will update thicket as I make progress.

christianparpart commented 11 years ago

p.s.: with "qps" you mean "queries per second" (thus, requests per second) ? so you actually once reached 200 requests per second befor the regression came in ?

christianparpart commented 11 years ago

2.) did you compile in debug or in release mode ? (-DCMAKE_BUILD_TYPE=release)

byzhang commented 11 years ago

200K for static file (100html) before the regression.

Thanks, -B

On Tue, Feb 26, 2013 at 3:50 PM, Christian Parpart <notifications@github.com

wrote:

p.s.: with "qps" you mean "queries per second" (thus, requests per second) ? so you actually once reached 200 requests per second befor the regression came in ?

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14147592 .

byzhang commented 11 years ago

Release.

Thanks, -B

On Tue, Feb 26, 2013 at 3:58 PM, Christian Parpart <notifications@github.com

wrote:

2.) did you compile in debug or in release mode ? (-DCMAKE_BUILD_TYPE=release)

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14147870 .

byzhang commented 11 years ago

On Tue, Feb 26, 2013 at 3:48 PM, Christian Parpart <notifications@github.com

wrote:

keep_alive_idle

Do you mean max_keepalive_idle 0 in the setup handler?

If I add it, then weighttp -n 1000 -c 64 -t 8 -k " http://127.0.0.1:8080/100.html" will report many errors as error: connect() failed: Cannot assign requested address (99)

while less -n can pass.

Thanks, -B

christianparpart commented 11 years ago

Hey @byzhang - regarding your "Cannot assign requested address" issue, that is solely reasoned by your local node config (see http://gwan.com/en_apachebench_httperf.html as a good starting point).

But that doesn't excuse me from the bug that caused the keep-alive issues. I will (hopefully today) just bump'n'tag release 0.6.0 and then start on working on this one. :)

Best regards, Christian.

byzhang commented 11 years ago

same errors after removing -k.

Thanks, -B

On Wed, Feb 27, 2013 at 3:05 AM, Christian Parpart <notifications@github.com

wrote:

Hey @byzhang https://github.com/byzhang - regarding your "Cannot assign requested address" issue, that is solely reasoned by your local node config (see http://gwan.com/en_apachebench_httperf.html as a good starting point).

But that doesn't excuse me from the bug that caused the keep-alive issues. I will (hopefully today) just bump'n'tag release 0.6.0 and then start on working on this one. :)

Best regards, Christian.

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14168324 .

christianparpart commented 11 years ago

Hey @byzhang

The reason why you get these networking failures on the client side when removing -k is because your client needs more sockets and your OS kernel does not allow this as you've just a limited number of source port numbers available (configurable, but a hard cap of 65k). This does not mean you can have that much connections open in parralel because you still have to fight the TIME_WAIT state of already closed TCP sessions. Again, this values is (kind of) tweakable, but needs deep knowledge of what you're doing (you can potentially harm your networking traffic when setting wrong values).

OTOH, I've fixed the issue you raised, that means, even though apache benchmark says -k is to enable keep-alive on connections, it also implies HTTP pipelining to function, which had a bug (see referenced ticket above).

I've done some basic tests and it seems to work fine now.

w/o keepalive/pipelining I can get up to 5Gb/s traffic throughput, and with keep-alive/pipelining enabled I get up to 12.7Gb/s traffic throughput. (tested on a 100 KB binary file with -c100 -n1000000)

edit: the benchmark was done with apache bench (ab). with weighttpd I seem to easily double it up to 20.7 Gbit/s with 4 bench threads. will compare this against others later (not part of this ticket I think)

christianparpart commented 11 years ago

p.s. there is alot of room for more performance improvements, such as opportunistic write()'s (currently disabled, but easy to enable back) and opportunistic accept()'s. and I am still doing a way too much dynamic memory management during the hot path. But everything of these needs careful planning in future stories. :)

Cheers, Christian.

ghost commented 11 years ago

Thanks for the fix :)

byzhang commented 11 years ago

Thanks, but it seems still doesn't work for me:

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done

finished in 61 sec, 304 millisec and 285 microsec, 1631 req/s, 625 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 39294752 bytes total, 29294752 bytes http, 10000000 bytes data

Thanks, -B

On Thu, Feb 28, 2013 at 4:45 PM, Sabri notifications@github.com wrote:

Thanks for the fix :)

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14267256 .

christianparpart commented 11 years ago

Hey @byzhang

That is bad. But can you tell me at what timeframe approximately you last got good results with weighttp?

Did it improve in any way with the recent changes now?

And please tell me what remote git url you are using you pull from.

Many thanks in advance, Christian.

byzhang commented 11 years ago

I saw 200K QPS before I took the vacation in Feb. And after I am back, I saw the keep alive issue as I reported. Last night, I pull the new changes, and the issue seems fixed, but performance still not as good as before.

Here are my env, wish it helpful:

$ git remote show origin

$ git log Fri Mar 1 01:27:53 2013 +0100 49099fc (HEAD, origin/master, origin/HEAD, master) [http] HttpConnection: Fixes pipelined request processing. refs #53 [Christian Parpart]

$ git st

On branch master

Changes not staged for commit:

modified: examples/app1.conf

modified: src/minimal.conf

modified: src/performance.conf

modified: src/x0d.conf-dist

#

Untracked files:

www/htdocs/100.html

no changes added to commit

$ git diff src/performance.conf diff --git i/src/performance.conf w/src/performance.conf index ce67075..63acbd1 100644 --- i/src/performance.conf +++ w/src/performance.conf @@ -21,7 +21,7 @@ handler setup etag.size true etag.inode false

$ make edit_cache Running interactive CMake command-line interface... Would you like to see advanced options? [No]: Please wait while cmake processes CMakeLists.txt files.... Variable Name: BUILD_TESTS Description: Build unit tests [default: off] Current Value: ON New Value (Enter to keep current value):

Variable Name: CMAKE_BUILD_TYPE Description: Choose the type of build, options are: None(CMAKE_CXX_FLAGS or CMAKE_C_FLAGS used) Debug Release RelWithDebInfo MinSizeRel. Current Value: Release New Value (Enter to keep current value):

...

Thanks, -B

On Sat, Mar 2, 2013 at 12:26 AM, Christian Parpart <notifications@github.com

wrote:

Hey @byzhang https://github.com/byzhang

That is bad. But can you tell me at what timeframe approximately you last got good results with weighttp?

Did it improve in any way with the recent changes now?

And please tell me what remote git url you are using you pull from.

Many thanks in advance, Christian.

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14325032 .

christianparpart commented 11 years ago

Hey @byzhang I hoped you git remote would have pointed to the old (private) git URL (which I'm not always updating that frequently). But your one looks fine and you are running in "release" mode, with a basic performance config. That all looks just fine.

I will go through my commits from now back to february and hope to find something. It is really bad, since I am getting nginx-level performance results on my machine.

I hope I will not bother you too much with investigating, but:

However, I will try finding our myself, so please don't feel enforced or something.

Your 100.html - does this imply 100 KByte or 100 Byte - and, how many CPU cores does your hardware have? ... I'd like to reproduce your test as good as possible.

Many thanks in advance (and for the patience), Christian.

byzhang commented 11 years ago

It's the only non-laptop machine. 100.html means 100 bytes (copied from gwan) And $ less /proc/cpuinfo shows it has 8 cores. model : 58 model name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz stepping : 9 microcode : 0x15 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 6999.61 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

I reset git head to c526464 (in 01/17) but still saw the bad results: weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% doneprogress: 40% done progress: 50% done progress: 60% done progress: 70% doneprogress: 80% done progress: 90% done progress: 100% done

finished in 61 sec, 294 millisec and 623 microsec, 1631 req/s, 619 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 38894752 bytes total, 28894752 bytes http, 10000000 bytes data

I guess there are something changed in my environment causing it. (I did do apt-get upgrade after back)

Thanks, -B

On Sat, Mar 2, 2013 at 3:08 PM, Christian Parpart notifications@github.comwrote:

Hey @byzhang https://github.com/byzhang I hoped you git remote would have pointed to the old (private) git URL (which I'm not always updating that frequently). But your one looks fine and you are running in "release" mode, with a basic performance config. That all looks just fine.

I will go through my commits from now back to february and hope to find something. It is really bad, since I am getting nginx-level performance results on my machine.

I hope I will not bother you too much with investigating, but:

  • would you mind trying another machine of yours (if possible)
  • try rolling back to the date you were approximatly before you went on vacation.

However, I will try finding our myself, so please don't feel enforced or something.

Your 100.html - does this imply 100 KByte or 100 Byte - and, how many CPU cores does your hardware have? ... I'd like to reproduce your test as good as possible.

Many thanks in advance (and for the patience), Christian.

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14337934 .

byzhang commented 11 years ago

And if I use the instant mode, $ src/x0d --instant=../www/htdocs,8080 then weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done

finished in 1 sec, 708 millisec and 296 microsec, 58537 req/s, 22234 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 38894752 bytes total, 28894752 bytes http, 10000000 bytes data

It seems I used instant mode before to see 200K in Jan. (This number is still not there, but it's on c526464)

Thanks, -B

On Sat, Mar 2, 2013 at 3:40 PM, BY byzhang@gmail.com wrote:

It's the only non-laptop machine. 100.html means 100 bytes (copied from gwan) And $ less /proc/cpuinfo shows it has 8 cores. model : 58 model name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz stepping : 9 microcode : 0x15 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 6999.61 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

I reset git head to c526464 (in 01/17) but still saw the bad results: weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% doneprogress: 40% done progress: 50% done progress: 60% done progress: 70% doneprogress: 80% done progress: 90% done progress: 100% done

finished in 61 sec, 294 millisec and 623 microsec, 1631 req/s, 619 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 38894752 bytes total, 28894752 bytes http, 10000000 bytes data

I guess there are something changed in my environment causing it. (I did do apt-get upgrade after back)

Thanks, -B

On Sat, Mar 2, 2013 at 3:08 PM, Christian Parpart < notifications@github.com> wrote:

Hey @byzhang https://github.com/byzhang I hoped you git remote would have pointed to the old (private) git URL (which I'm not always updating that frequently). But your one looks fine and you are running in "release" mode, with a basic performance config. That all looks just fine.

I will go through my commits from now back to february and hope to find something. It is really bad, since I am getting nginx-level performance results on my machine.

I hope I will not bother you too much with investigating, but:

  • would you mind trying another machine of yours (if possible)
  • try rolling back to the date you were approximatly before you went on vacation.

However, I will try finding our myself, so please don't feel enforced or something.

Your 100.html - does this imply 100 KByte or 100 Byte - and, how many CPU cores does your hardware have? ... I'd like to reproduce your test as good as possible.

Many thanks in advance (and for the patience), Christian.

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14337934 .

byzhang commented 11 years ago

I randomly tried some commits after c526464, but don't see big difference there. (instant mode is ~60K, while performance.conf is 1631, all the time. Something must be wrong)

Thanks, -B

On Sat, Mar 2, 2013 at 3:42 PM, BY byzhang@gmail.com wrote:

And if I use the instant mode, $ src/x0d --instant=../www/htdocs,8080 then weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done

finished in 1 sec, 708 millisec and 296 microsec, 58537 req/s, 22234 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 38894752 bytes total, 28894752 bytes http, 10000000 bytes data

It seems I used instant mode before to see 200K in Jan. (This number is still not there, but it's on c526464)

Thanks, -B

On Sat, Mar 2, 2013 at 3:40 PM, BY byzhang@gmail.com wrote:

It's the only non-laptop machine. 100.html means 100 bytes (copied from gwan) And $ less /proc/cpuinfo shows it has 8 cores. model : 58 model name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz stepping : 9 microcode : 0x15 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 6999.61 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

I reset git head to c526464 (in 01/17) but still saw the bad results: weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% doneprogress: 40% done progress: 50% done progress: 60% done progress: 70% doneprogress: 80% done progress: 90% done progress: 100% done

finished in 61 sec, 294 millisec and 623 microsec, 1631 req/s, 619 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 38894752 bytes total, 28894752 bytes http, 10000000 bytes data

I guess there are something changed in my environment causing it. (I did do apt-get upgrade after back)

Thanks, -B

On Sat, Mar 2, 2013 at 3:08 PM, Christian Parpart < notifications@github.com> wrote:

Hey @byzhang https://github.com/byzhang I hoped you git remote would have pointed to the old (private) git URL (which I'm not always updating that frequently). But your one looks fine and you are running in "release" mode, with a basic performance config. That all looks just fine.

I will go through my commits from now back to february and hope to find something. It is really bad, since I am getting nginx-level performance results on my machine.

I hope I will not bother you too much with investigating, but:

  • would you mind trying another machine of yours (if possible)
  • try rolling back to the date you were approximatly before you went on vacation.

However, I will try finding our myself, so please don't feel enforced or something.

Your 100.html - does this imply 100 KByte or 100 Byte - and, how many CPU cores does your hardware have? ... I'd like to reproduce your test as good as possible.

Many thanks in advance (and for the patience), Christian.

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14337934 .

byzhang commented 11 years ago

$ g++ --version g++ (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2 Copyright (C) 2012 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

uname -a Linux server1 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

All other packages, except for kernel, are up-to-date $ sudo apt-get upgrade Reading package lists... Done Building dependency tree Reading state information... Done The following packages have been kept back: linux-headers-generic linux-image-generic 0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.

Thanks, -B

On Sat, Mar 2, 2013 at 3:49 PM, BY byzhang@gmail.com wrote:

I randomly tried some commits after c526464, but don't see big difference there. (instant mode is ~60K, while performance.conf is 1631, all the time. Something must be wrong)

Thanks, -B

On Sat, Mar 2, 2013 at 3:42 PM, BY byzhang@gmail.com wrote:

And if I use the instant mode, $ src/x0d --instant=../www/htdocs,8080 then weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done

finished in 1 sec, 708 millisec and 296 microsec, 58537 req/s, 22234 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 38894752 bytes total, 28894752 bytes http, 10000000 bytes data

It seems I used instant mode before to see 200K in Jan. (This number is still not there, but it's on c526464)

Thanks, -B

On Sat, Mar 2, 2013 at 3:40 PM, BY byzhang@gmail.com wrote:

It's the only non-laptop machine. 100.html means 100 bytes (copied from gwan) And $ less /proc/cpuinfo shows it has 8 cores. model : 58 model name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz stepping : 9 microcode : 0x15 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms bogomips : 6999.61 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

I reset git head to c526464 (in 01/17) but still saw the bad results: weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% doneprogress: 40% done progress: 50% done progress: 60% done progress: 70% doneprogress: 80% done progress: 90% done progress: 100% done

finished in 61 sec, 294 millisec and 623 microsec, 1631 req/s, 619 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 38894752 bytes total, 28894752 bytes http, 10000000 bytes data

I guess there are something changed in my environment causing it. (I did do apt-get upgrade after back)

Thanks, -B

On Sat, Mar 2, 2013 at 3:08 PM, Christian Parpart < notifications@github.com> wrote:

Hey @byzhang https://github.com/byzhang I hoped you git remote would have pointed to the old (private) git URL (which I'm not always updating that frequently). But your one looks fine and you are running in "release" mode, with a basic performance config. That all looks just fine.

I will go through my commits from now back to february and hope to find something. It is really bad, since I am getting nginx-level performance results on my machine.

I hope I will not bother you too much with investigating, but:

  • would you mind trying another machine of yours (if possible)
  • try rolling back to the date you were approximatly before you went on vacation.

However, I will try finding our myself, so please don't feel enforced or something.

Your 100.html - does this imply 100 KByte or 100 Byte - and, how many CPU cores does your hardware have? ... I'd like to reproduce your test as good as possible.

Many thanks in advance (and for the patience), Christian.

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14337934 .

christianparpart commented 11 years ago

Hey

the instant mode is basically just a builtin config [1] that's loaded instead of the passed config file. Since the performance.conf is much more slim in the hot path (main handler) it absolutely doesn't make sense that this part is faster than the instant mode, which does more within the main handler.

But i'll look into it.

[1] https://github.com/xzero/x0/blob/master/src/x0d.cpp#L723

christianparpart commented 11 years ago

try lowering the backlog (I still don't think that'll fix it, but worth a try). Maybe Ubuntu did something with the sysctl (/proc/sys/) settings with the upgrade. but that's also unlikely. ...

christianparpart commented 11 years ago

Hey @byzhang

it seems like I found the reason for the regression you found. Commit 047fa0743a59b326995a09291c7896a9c03a88b7 introduces a fix to the tcp_cork configuration option. You have to set this value to true since then to actually take affect of TCP_CORK socket option.

This fix was introduced in December 12th 2012.

Please update to the very latest origin/master and verify that tcp_cork in there is set to true, then rerun your test and tell me in how far that helped.

I really hope that this actually fixes your regression; I will also update the performance.conf to actually fix the default sample config file. And if it does, please run cmake and enable the options WITH_MULTI_ACCEPT (ae449c328c43527c63a536e47d161f8dcd190302) and WITH_OPPORTUNISTIC_WRITE (497242bac3c58a0f0aa5bc00c628510ea6bdf624) and then modify your listen line as follows:

listen 'bind' => 0.0.0.0, 'port' => 8081, 'backlog' => 256, 'multi_accept' => 8

Try playing around with the value of multi_accept :)

byzhang commented 11 years ago

It fixes with tcp_cork = true :)

$ weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done

finished in 0 sec, 464 millisec and 117 microsec, 215462 req/s, 82681 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 39294710 bytes total, 29294710 bytes http, 10000000 bytes data

Thanks, -B

On Sat, Mar 2, 2013 at 8:21 PM, Christian Parpart notifications@github.comwrote:

Hey @byzhang https://github.com/byzhang

it seems like I found the reason for the regression you found. Commit 047fa07https://github.com/xzero/x0/commit/047fa0743a59b326995a09291c7896a9c03a88b7introduces a fix to the tcp_cork configuration option. You have to set this value to true since then to actually take affect of TCP_CORK socket option.

This fix was introduced in December 12th 2012.

Please update to the very latest origin/master and verify that tcp_cork in there is set to true, then rerun your test and tell me in how far that helped.

I really hope that this actually fixes your regression; I will also update the performance.conf to actually fix the default sample config file. And if it does, please run cmake and enable the options WITH_MULTI_ACCEPT( ae449c3https://github.com/xzero/x0/commit/ae449c328c43527c63a536e47d161f8dcd190302) and WITH_OPPORTUNISTIC_WRITE (497242bhttps://github.com/xzero/x0/commit/497242bac3c58a0f0aa5bc00c628510ea6bdf624) and then modify your listen line as follows:

listen 'bind' => 0.0.0.0, 'port' => 8081, 'backlog' => 256, 'multi_accept' => 8

Try playing around with the value of multi_accept :)

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14341365 .

byzhang commented 11 years ago

multi_accept seems doesn't matter for 100.html. (I only tried 8 and 64)

Thanks, -B

On Sat, Mar 2, 2013 at 8:28 PM, BY byzhang@gmail.com wrote:

It fixes with tcp_cork = true :)

$ weighttp -n 100000 -c 64 -t 8 -k "http://127.0.0.1:8080/100.html" weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark... spawning thread #1: 8 concurrent requests, 12500 total requests spawning thread #2: 8 concurrent requests, 12500 total requests spawning thread #3: 8 concurrent requests, 12500 total requests spawning thread #4: 8 concurrent requests, 12500 total requests spawning thread #5: 8 concurrent requests, 12500 total requests spawning thread #6: 8 concurrent requests, 12500 total requests spawning thread #7: 8 concurrent requests, 12500 total requests spawning thread #8: 8 concurrent requests, 12500 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done

finished in 0 sec, 464 millisec and 117 microsec, 215462 req/s, 82681 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 39294710 bytes total, 29294710 bytes http, 10000000 bytes data

Thanks, -B

On Sat, Mar 2, 2013 at 8:21 PM, Christian Parpart < notifications@github.com> wrote:

Hey @byzhang https://github.com/byzhang

it seems like I found the reason for the regression you found. Commit 047fa07https://github.com/xzero/x0/commit/047fa0743a59b326995a09291c7896a9c03a88b7introduces a fix to the tcp_cork configuration option. You have to set this value to true since then to actually take affect of TCP_CORK socket option.

This fix was introduced in December 12th 2012.

Please update to the very latest origin/master and verify that tcp_cork in there is set to true, then rerun your test and tell me in how far that helped.

I really hope that this actually fixes your regression; I will also update the performance.conf to actually fix the default sample config file. And if it does, please run cmake and enable the options WITH_MULTI_ACCEPT( ae449c3https://github.com/xzero/x0/commit/ae449c328c43527c63a536e47d161f8dcd190302) and WITH_OPPORTUNISTIC_WRITE (497242bhttps://github.com/xzero/x0/commit/497242bac3c58a0f0aa5bc00c628510ea6bdf624) and then modify your listen line as follows:

listen 'bind' => 0.0.0.0, 'port' => 8081, 'backlog' => 256, 'multi_accept' => 8

Try playing around with the value of multi_accept :)

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14341365 .

christianparpart commented 11 years ago

so the initial regression seems fixed right?

And did you try enabling/disabling opportunistic writes ?

p.s.: I like your hardware ;)

byzhang commented 11 years ago

Yes, the regression is fixed. And opportunistic writes was enabled after I pulled your commits. It's a sub-$1000 PC, but very powerful if you don't run java on it :D

Thanks, -B

On Sat, Mar 2, 2013 at 9:19 PM, Christian Parpart notifications@github.comwrote:

so the initial regression seems fixed right?

And did you try enabling/disabling opportunistic writes ?

p.s.: I like your hardware ;)

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14341939 .

christianparpart commented 11 years ago

No it is not enables by default. Same for multi accept. Unless i was too tires last night. :) Am 03.03.2013 08:05 schrieb "byzhang" notifications@github.com:

Yes, the regression is fixed. And opportunistic writes was enabled after I pulled your commits. It's a sub-$1000 PC, but very powerful if you don't run java on it :D

Thanks, -B

On Sat, Mar 2, 2013 at 9:19 PM, Christian Parpart notifications@github.comwrote:

so the initial regression seems fixed right?

And did you try enabling/disabling opportunistic writes ?

p.s.: I like your hardware ;)

— Reply to this email directly or view it on GitHub< https://github.com/xzero/x0/issues/53#issuecomment-14341939> .

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14343144 .

byzhang commented 11 years ago

Sorry I meant I enabled them using make edit_cache after pull commits.

Thanks, -B

On Sun, Mar 3, 2013 at 12:26 AM, Christian Parpart <notifications@github.com

wrote:

No it is not enables by default. Same for multi accept. Unless i was too tires last night. :) Am 03.03.2013 08:05 schrieb "byzhang" notifications@github.com:

Yes, the regression is fixed. And opportunistic writes was enabled after I pulled your commits. It's a sub-$1000 PC, but very powerful if you don't run java on it :D

Thanks, -B

On Sat, Mar 2, 2013 at 9:19 PM, Christian Parpart notifications@github.comwrote:

so the initial regression seems fixed right?

And did you try enabling/disabling opportunistic writes ?

p.s.: I like your hardware ;)

— Reply to this email directly or view it on GitHub< https://github.com/xzero/x0/issues/53#issuecomment-14341939> .

— Reply to this email directly or view it on GitHub< https://github.com/xzero/x0/issues/53#issuecomment-14343144> .

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14343755 .

christianparpart commented 11 years ago

Ah. Ok. Many thanks. Now it's clear. :) Am 03.03.2013 09:56 schrieb "byzhang" notifications@github.com:

Sorry I meant I enabled them using make edit_cache after pull commits.

Thanks, -B

On Sun, Mar 3, 2013 at 12:26 AM, Christian Parpart < notifications@github.com

wrote:

No it is not enables by default. Same for multi accept. Unless i was too tires last night. :) Am 03.03.2013 08:05 schrieb "byzhang" notifications@github.com:

Yes, the regression is fixed. And opportunistic writes was enabled after I pulled your commits. It's a sub-$1000 PC, but very powerful if you don't run java on it :D

Thanks, -B

On Sat, Mar 2, 2013 at 9:19 PM, Christian Parpart notifications@github.comwrote:

so the initial regression seems fixed right?

And did you try enabling/disabling opportunistic writes ?

p.s.: I like your hardware ;)

— Reply to this email directly or view it on GitHub< https://github.com/xzero/x0/issues/53#issuecomment-14341939> .

— Reply to this email directly or view it on GitHub< https://github.com/xzero/x0/issues/53#issuecomment-14343144> .

— Reply to this email directly or view it on GitHub< https://github.com/xzero/x0/issues/53#issuecomment-14343755> .

— Reply to this email directly or view it on GitHubhttps://github.com/xzero/x0/issues/53#issuecomment-14343962 .

christianparpart commented 11 years ago

p.s. when compiling against LLVM 3.1 (instead of 3.0) you gain a little more performance. a second (more integral) performance gain can be earned by setting release CFLAGS with including: -march=native -mtune=native