Open AndrewDryga opened 7 years ago
Might benefit from using: https://github.com/zachdaniel/spandex
Related to: #209
Writing logs to ProstreSQL under load is obviously inefficient, by default logs are written from an async process, but it's message queue gets backed up pretty fast.
Even worse, because the queue is so long, processes that make calls to that queue are punished by BEAM backpressure mechanism (number of reductions taken for a message passing is proportional to queue length). So deviation for incoming connection grows because processes that handle them are suspended. I believe, in next OTP versions, it's even planned to get rid of this backpressure.
Some data (with async log writer):
ab -p ./post_loc.txt -l -n 100000 -c 100 http://127.0.0.1:4000/httpbin/post
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: gunicorn/19.6.0
Server Hostname: 127.0.0.1
Server Port: 4000
Document Path: /httpbin/post
Document Length: Variable
Concurrency Level: 100
Time taken for tests: 475.071 seconds
Complete requests: 100000
Failed requests: 0
Non-2xx responses: 621
Total transferred: 1104451167 bytes
Total body sent: 406500000
HTML transferred: 1069272902 bytes
Requests per second: 210.49 [#/sec] (mean)
Time per request: 475.071 [ms] (mean)
Time per request: 4.751 [ms] (mean, across all concurrent requests)
Transfer rate: 2270.32 [Kbytes/sec] received
835.61 kb/s sent
3105.93 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 36 205.5 0 7569
Processing: 7 438 204.1 433 4249
Waiting: 6 437 204.0 432 4248
Total: 7 475 266.8 448 7604
Percentage of the requests served within a certain time (ms)
50% 448
66% 501
75% 540
80% 569
90% 679
95% 881
98% 1186
99% 1505
100% 7604 (longest request)
With sync log writer:
config :annon_api, :plugin_pipeline,
default_features: [:log_consistency]
Message queues are empty (almost always 0) and memory consumption is under 100 Mb.
ab -p ./post_loc.txt -l -n 100000 -c 100 http://127.0.0.1:4000/httpbin/post
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests
Server Software: gunicorn/19.6.0
Server Hostname: 127.0.0.1
Server Port: 4000
Document Path: /httpbin/post
Document Length: Variable
Concurrency Level: 100
Time taken for tests: 487.047 seconds
Complete requests: 100000
Failed requests: 0
Non-2xx responses: 443
Total transferred: 1106356543 bytes
Total body sent: 406500000
HTML transferred: 1071172048 bytes
Requests per second: 205.32 [#/sec] (mean)
Time per request: 487.047 [ms] (mean)
Time per request: 4.870 [ms] (mean, across all concurrent requests)
Transfer rate: 2218.32 [Kbytes/sec] received
815.06 kb/s sent
3033.38 kb/s total
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 24 151.0 0 2265
Processing: 15 463 188.1 444 2339
Waiting: 15 460 187.4 442 2321
Total: 16 487 224.0 450 3230
Percentage of the requests served within a certain time (ms)
50% 450
66% 502
75% 548
80% 584
90% 699
95% 870
98% 1158
99% 1361
100% 3230 (longest request)
Even tail-case latency is better now.
Logging got few issues and that starts to worry me:
I guess it is time to reconsider this part of Annon to be a global adapter-based tracing module, which can keep working as is (which is okay for small deployments) and get much more features for production systems: send data to SaaS services, other data stores - Elasticsearch or Cassandra, tracing systems - DataDog APM, NewRelic APM, AppSignal or Scout.
Adapter API would be large and demanding, so not every service would be able to be integrated, but that is okay since most of the issues above would have an approach to solve them.