Doodle3D / WiFi-Box

The Doodle3D WiFi-Box for wireless 3D-printing
GNU General Public License v3.0
3 stars 1 forks source link

Doodle3D stresstest #6

Closed olijf closed 8 years ago

olijf commented 8 years ago

I am currently stress testing the wifibox to see how many users can concurrently connect to it.

First approach

  1. Open multiple tabs on different machines Using my laptop and tablet I have around 20 tabs open.
  2. Start a print
  3. Check mem & cpu usage on wifibox

Result: this does not seem to be very stressfull.

Second approach

  1. use the Apache Benchmark tool (ab available from the apache2-utils package)
  2. start a benchmark (ab -n <number of requests> -c <number of concurrent connections> http://<ip of wifibox>/d3dapi/info/status)
  3. start a print

check usage on wifibox whilst increasing the number of concurrent connections example: ab -n 3000 -c 1 -e test.csv http://10.0.0.123/d3dapi/info/status 2>&1 | tee -a test1.txt (I use tee for easy logging)

output:

This is ApacheBench, Version 2.3 <$Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.0.0.123 (be patient)
Completed 300 requests
Completed 600 requests
Completed 900 requests
Completed 1200 requests
Completed 1500 requests
Completed 1800 requests
Completed 2100 requests
Completed 2400 requests
Completed 2700 requests
Completed 3000 requests
Finished 3000 requests

Server Software:        
Server Hostname:        10.0.0.123
Server Port:            80

Document Path:          /d3dapi/info/status
Document Length:        214 bytes

Concurrency Level:      1
Time taken for tests:   268.633 seconds
Complete requests:      3000
Failed requests:        2590
   (Connect: 0, Receive: 0, Length: 2590, Exceptions: 0)
Total transferred:      1053797 bytes
HTML transferred:       666797 bytes
Requests per second:    11.17 [#/sec] (mean)
Time per request:       89.544 [ms] (mean)
Time per request:       89.544 [ms] (mean, across all concurrent requests)
Transfer rate:          3.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    5   5.2      4      60
Processing:    72   84  20.0     76     280
Waiting:       59   81  19.6     73     278
Total:         74   89  20.9     81     282

Percentage of the requests served within a certain time (ms)
  50%     81
  66%     85
  75%     90
  80%     95
  90%    121
  95%    136
  98%    151
  99%    165
 100%    282 (longest request)

It seems the wifibox cant handle more than 2 concurrent connections at a time. Especially while sending a print this seems to be an issue. As the wifibox starts up multiple instances of uhttpd they never get shut down properly if the socket is not closed properly by the client.(eg when the client exceeds the timeout setup in ab)

I am unsure if this is a realistic testing approach as it stresses the wifibox enormously compared to the more realistic approach (see the first approach)

peteruithoven commented 8 years ago

It might be interesting to have something like a Apache Access log. In this we could read back which requests came in, which returned what (if any) response code. https://httpd.apache.org/docs/2.4/logs.html

I would zoom into the issue that instances don't get shut down. This might be easier when adding a test api endpoint that just keeps executing (a while loop for example) for quite a while. This excludes issues caused by communication with print3d for example.

olijf commented 8 years ago

I looked into why the wifibox is not able to handle more than 2 concurrent connections. It seems this is a config issue I tinkered with several values for max_connections (see below) in the uhttpd config and it seems that the whilst idling the wifibox is able to handle around 20 concurrent connections and whilst printing it is better to allow only 10

When more than 10 uhttpd processes are running at the same time (so more than 10 concurrent connections) the wifibox get sluggish and when it runs out of memory at around 16 instances + print3d + logrotate eating away part of that the linux OOM (out of memory) manager start firing off processes.

OOM

OOM ranks the processes and always tries to kill the least possible while freeing the maximum amount possible. If print3d is eating up resources this is an easy candidate to kill...

The OOM score can be manipulated by giving the process a negative kill score in /proc/{pid of print3d}/oom_score_adj (See http://backdrift.org/oom-killer-how-to-create-oom-exclusions-in-linux)

This will make sure the print3d driver is never killed to free up resources. The penalty in using this is that uhttpd instances are killed faster. (which means you can sometimes miss a few connections) which is nothing to worry about as it does not interfere with the printing process. (side note: whilst sending a large print this might be a minor problem as sometimes the answer to a post request is not send as the uhttpd instance is killed and the socket is closed.... there is a slim chance that part of your print is not send out correctly although I never observed this)

uhttpd config

Changing max_requests=10 (max script calls default=3) max_connections=10 (max number of simultaneous connections default=100 if more users are connected they are queued) allows for more connections per user...

See: https://dev.openwrt.org/browser/trunk/package/network/services/uhttpd/files/uhttpd.config?rev=36932

As you can see these are concurrent connections, in a real word environment where users are polling the wifibox every few seconds this fix would probably allow for a lot more concurrent users.

All this was done on the 0.10.10-c beta. Log level: bulk

olijf commented 8 years ago

In case anyone is interested in seeing some visuals on how this uhttpd config increases reliability below are the diagrams I put in my report.

results

peteruithoven commented 8 years ago

@olijf interesting! What kind of requests are you doing here concurrently? So both diagrams visualise the same except that on the second one your also printing? So besides a tiny duration increate when printing and having many concurrent requests there is no downside to allowing 16 concurrent requests? Do you perhaps have data on the impact on the available ram / cpu? Did it influence print3d (did it stutter while printing)?

olijf commented 8 years ago

What kind of requests are you doing here concurrently?

I am testing the Doodle3D api page GET /info/status using the Apache Benchmark tool

So both diagrams visualise the same except that on the second one your also printing?

Yes

So besides a tiny duration increate when printing and having many concurrent requests there is no downside to allowing 16 concurrent requests?

It looks as though it is better to stay below this value otherwise you will experience a lot of "lost connection" errors. So thats why I choose 10 as this seems to improve a lot of things compared to 3 concurrent connections and also allows for some headroom as it does allow for even more although at ~7sec at average it is inadvisable to allow the wifibox to be used on large groups (you see in this diagram that the line of max_request=3 ends while printing at around 4 max concurrent connections, after that nothing comes through)

Do you perhaps have data on the impact on the available ram / cpu? Did it influence print3d (did it stutter while printing)?

I did see some stuttering while printing with max_requests set to 16, and on log level bulk you see it far more often. I also observed usability of the doodle3d app with the different settings and it seems that at concurrent connections > 12-15 (above this value it is almost impossible to send a post request) the web app gets very unresponsive.

peteruithoven commented 8 years ago

This test resulted in two pull requests: https://github.com/Doodle3D/doodle3d-firmware/pull/61 and https://github.com/Doodle3D/print3d/pull/46. Could we close this issue?