Closed olijf closed 8 years ago
It might be interesting to have something like a Apache Access log. In this we could read back which requests came in, which returned what (if any) response code. https://httpd.apache.org/docs/2.4/logs.html
I would zoom into the issue that instances don't get shut down. This might be easier when adding a test api endpoint that just keeps executing (a while loop for example) for quite a while. This excludes issues caused by communication with print3d for example.
I looked into why the wifibox is not able to handle more than 2 concurrent connections. It seems this is a config issue I tinkered with several values for max_connections (see below) in the uhttpd config and it seems that the whilst idling the wifibox is able to handle around 20 concurrent connections and whilst printing it is better to allow only 10
When more than 10 uhttpd processes are running at the same time (so more than 10 concurrent connections) the wifibox get sluggish and when it runs out of memory at around 16 instances + print3d + logrotate eating away part of that the linux OOM (out of memory) manager start firing off processes.
OOM ranks the processes and always tries to kill the least possible while freeing the maximum amount possible. If print3d is eating up resources this is an easy candidate to kill...
The OOM score can be manipulated by giving the process a negative kill score in /proc/{pid of print3d}/oom_score_adj (See http://backdrift.org/oom-killer-how-to-create-oom-exclusions-in-linux)
This will make sure the print3d driver is never killed to free up resources. The penalty in using this is that uhttpd instances are killed faster. (which means you can sometimes miss a few connections) which is nothing to worry about as it does not interfere with the printing process. (side note: whilst sending a large print this might be a minor problem as sometimes the answer to a post request is not send as the uhttpd instance is killed and the socket is closed.... there is a slim chance that part of your print is not send out correctly although I never observed this)
Changing max_requests=10 (max script calls default=3) max_connections=10 (max number of simultaneous connections default=100 if more users are connected they are queued) allows for more connections per user...
See: https://dev.openwrt.org/browser/trunk/package/network/services/uhttpd/files/uhttpd.config?rev=36932
As you can see these are concurrent connections, in a real word environment where users are polling the wifibox every few seconds this fix would probably allow for a lot more concurrent users.
All this was done on the 0.10.10-c beta. Log level: bulk
In case anyone is interested in seeing some visuals on how this uhttpd config increases reliability below are the diagrams I put in my report.
@olijf interesting! What kind of requests are you doing here concurrently? So both diagrams visualise the same except that on the second one your also printing? So besides a tiny duration increate when printing and having many concurrent requests there is no downside to allowing 16 concurrent requests? Do you perhaps have data on the impact on the available ram / cpu? Did it influence print3d (did it stutter while printing)?
What kind of requests are you doing here concurrently?
I am testing the Doodle3D api page GET /info/status using the Apache Benchmark tool
So both diagrams visualise the same except that on the second one your also printing?
Yes
So besides a tiny duration increate when printing and having many concurrent requests there is no downside to allowing 16 concurrent requests?
It looks as though it is better to stay below this value otherwise you will experience a lot of "lost connection" errors. So thats why I choose 10 as this seems to improve a lot of things compared to 3 concurrent connections and also allows for some headroom as it does allow for even more although at ~7sec at average it is inadvisable to allow the wifibox to be used on large groups (you see in this diagram that the line of max_request=3 ends while printing at around 4 max concurrent connections, after that nothing comes through)
Do you perhaps have data on the impact on the available ram / cpu? Did it influence print3d (did it stutter while printing)?
I did see some stuttering while printing with max_requests set to 16, and on log level bulk you see it far more often. I also observed usability of the doodle3d app with the different settings and it seems that at concurrent connections > 12-15 (above this value it is almost impossible to send a post request) the web app gets very unresponsive.
This test resulted in two pull requests: https://github.com/Doodle3D/doodle3d-firmware/pull/61 and https://github.com/Doodle3D/print3d/pull/46. Could we close this issue?
I am currently stress testing the wifibox to see how many users can concurrently connect to it.
First approach
Result: this does not seem to be very stressfull.
Second approach
ab
available from theapache2-utils
package)ab -n <number of requests> -c <number of concurrent connections> http://<ip of wifibox>/d3dapi/info/status
)check usage on wifibox whilst increasing the number of concurrent connections example:
ab -n 3000 -c 1 -e test.csv http://10.0.0.123/d3dapi/info/status 2>&1 | tee -a test1.txt
(I use tee for easy logging)output:
It seems the wifibox cant handle more than 2 concurrent connections at a time. Especially while sending a print this seems to be an issue. As the wifibox starts up multiple instances of uhttpd they never get shut down properly if the socket is not closed properly by the client.(eg when the client exceeds the timeout setup in
ab
)I am unsure if this is a realistic testing approach as it stresses the wifibox enormously compared to the more realistic approach (see the first approach)