Open rohitjoshi opened 4 years ago
@rohitjoshi thanks for your information. I haven't tried including custom allocator for it. actually I'm glad to give it a try. but I'm not very familiar with this, do you have any suggestions to achieve it?
Let me submit a PR. It can be included either in a make file or even as a startup parameter
Thank you so much.
See oatpp
@rohitjoshi I've submitted a PR to tfb, thanks a lot.
Great, did you update other docker file as well? I am not sure which one is getting used.
Yes, I updated docker files. Unfortunately, it cannot be installed in ubuntu with apt command, I had to installed it by source code. thanks.
Any measurable performance improvement? It would be good to compare snmalloc vs mimalloc. Also I see many other frameworks are cheating to improve performance. E.g. using hardcoded content length
On my local computer, it can introduce 10% -15% QPS improvement, which is exciting. I know some frameworks tests are written in some strange way, there are always loopholes in the rules, but I hope the code style in tfb tests should look like code in normal production systems. thanks for your information.
In the recent benchmark , drogon-core
is 2nd in the place. Hopefully with 10% increase, it will be #1.
Cool, it's pretty exciting to see the next tbf results.
40% higher throughput compared to 3nd number (may-minihttp
) in the multiple queries category.
If you do get a comparison between mimalloc
and snmalloc
would be very interested to see the results. Also, if you have issues with using snmalloc
please post to our github.
@mjp41 , thank you very much. I have done some tests on the memory allocator. The test results are as follows:
1. normal malloc
wrk -c512 -d15 -t8 http://localhost:8088/plaintext -s pipeline.lua -- 64
Running 15s test @ http://localhost:8088/plaintext
8 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.38ms 8.19ms 348.77ms 96.26%
Req/Sec 555.07k 48.57k 1.09M 84.18%
66323456 requests in 15.10s, 7.84GB read
Requests/sec: 4392607.08
Transfer/sec: 532.02MB
2.snmalloc
wrk -c512 -d15 -t8 http://localhost:8088/plaintext -s pipeline.lua -- 64
Running 15s test @ http://localhost:8088/plaintext
8 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.45ms 3.43ms 224.28ms 87.17%
Req/Sec 646.79k 40.61k 1.26M 85.02%
77325440 requests in 15.10s, 9.15GB read
Requests/sec: 5120800.20
Transfer/sec: 620.21MB
3.mimalloc
wrk -c512 -d15 -t8 http://localhost:8088/plaintext -s pipeline.lua -- 64
Running 15s test @ http://localhost:8088/plaintext
8 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.51ms 3.79ms 214.67ms 88.30%
Req/Sec 645.94k 42.95k 702.71k 84.33%
77111168 requests in 15.02s, 9.12GB read
Requests/sec: 5132390.37
Transfer/sec: 621.62MB
I can say that in this test, snmalloc and mimalloc have similar performance improvements.
Interesting. The thread stats for snmalloc look fractionally better, but the overall stats for mimalloc look fractionally better. I could well imagine the noise in the benchmark is greater than the differences. Thanks for sharing the results.
Yes, I agree with you, thanks for your excellent work.
These preliminary benchmarks look really great. Thanks for the contributions and the awesome work which made the framework even better.
Not winning by 10% tho, still good to see the results. I was guessing the gap for the ORM version would have been tightened.
I don't want to open a new issue so I'll post it here. Drogon has won the first place in the Composite framework score section of the Round 19 of TechEmpower benchmarks posted two days ago.
Congratulations!
That’s some awesome news! 🎉
Thanks for sharing @vedranmiletic.
Thanks to @an-tao and all other contributors for the great work and reaching this awesome milestone 🙂
Out of curiosity, @an-tao do you have any interest in writing a blogpost or something about how you achieved this performance? Seems like it could be very helpful for people.
Great work!
@deklanw Thanks for your attention. I think drogon's high performance benefits from the following:
Completely non-blocking programing: drogon provides asynchronous interfaces for users to handle HTTP requests and non-blocking asynchronous database interfaces to access the database. Therefore, users can use a small number of threads (usually the number of CPU cores) to process very large requests at the same time. The only blocking point in each thread is the epoll_wait() call, when there is really nothing to do, the CPU is blocked there.
Lock-free: It goes without saying that the critical section protected by a global mutex is harmful to the concurrent performance of a program. I spent a lot of time to remove locks in the framework, including using lock-free queues, FastDbClient, etc. Finally, the execution path of drogon is almost no locks, which means that each thread can run at full speed without waiting for each other.
Batch-mode of libpq: there is a batch-mode patch for libpq which can pipeline SQL queries into asynchronous batches in the same connection, this is very helpful for increasing the usage rate of database connections. see here for more details.
Basically the techniques mentioned above are used in almost all the top TFB frameworks. I think the gap between the top frameworks is the result of the accumulation of many implementation details, and of course the differences between the programing languages.
I see Rust based frameworks actix and may_minhttp both using
sn_malloc
andmimalloc
allocator which shows significant improvement. Have you tried including custom allocator for Drogon benchmarks?