Open albertxavier100 opened 3 weeks ago
I'm mesuring approximately 300K requests per second using the http_server_async example and utilizing 4 threads (as stated in the provided link). However, if you create one io_context per core and use individual acceptors (using SO_REUSEPORT
option), you can anticipate around 150K requests per second for each additional core you put into your setup (as they work independently).
Regardless, I doubt this benchmark accurately reflects the usage of these libraries in real-world scenarios:
So, what exactly are these benchmarks testing? Simple HTTP servers that respond to GET requests with a constant, 17-byte-long message.
This is comparing apples to pears.
curl
. Even if you kept it alive, at some point the client might have filled the entire TCP receive buffer and will effectively block on sending the request. -> You need to properly process requests.The C++ version used a synchronous accept loop
I compared with my own C++ version that fixes these. I have two conditional defines (TRUE_HTTP
and CACHED_RESPONSE
) that you can play around with. The code comments explain their (observed) effect.
#include <boost/asio.hpp>
#include <boost/asio/experimental/awaitable_operators.hpp>
#include <boost/beast.hpp>
#include <boost/lexical_cast.hpp>
#include <fstream>
#include <iostream>
#include <syncstream>
namespace beast = boost::beast;
namespace http = beast::http;
namespace net = boost::asio;
using tcp = net::ip::tcp;
using namespace net::experimental::awaitable_operators;
using executor_t = net::thread_pool::executor_type;
using acceptor_t = net::deferred_t::as_default_on_t<net::basic_socket_acceptor<tcp, executor_t>>;
using socket_t = net::deferred_t::as_default_on_t<net::basic_stream_socket<tcp, executor_t>>;
[[maybe_unused]] static std::string read_html_file(std::string const& file_path) {
std::ifstream file(file_path, std::ios::binary);
return {std::istreambuf_iterator<char>(file), {}};
}
static auto const make_response_message = [](bool keep_alive) {
// Construct an HTTP response with the HTML content
std::string_view msg = "Hello from C++!!!"; // or read_html_file("hello.html");
http::response<http::span_body<char const>> res{http::status::ok, 11, msg};
res.set(http::field::server, "C++ Server");
res.set(http::field::content_type, "text/html");
res.keep_alive(keep_alive);
res.prepare_payload();
return res;
};
static auto const s_cooked_response = [] {
static auto const text = boost::lexical_cast<std::string>(make_response_message(true));
return net::buffer(text);
}();
net::awaitable<void, executor_t> handle_client_async(socket_t socket) try {
socket.set_option(tcp::no_delay(true)); // no difference observed in benchmark
#ifdef TRUE_HTTP // This affects throughput by only about -10%
beast::flat_buffer buf;
for (http::request<http::empty_body> req;; req.clear()) {
auto [ec, _] = co_await async_read(socket, buf, req, as_tuple(net::deferred));
if (ec)
break;
#ifdef CACHED_RESPONSE
// emulate caching server
co_await async_write(socket, s_cooked_response);
#else
// This is a more realistic way but probably NOT what Kestrel is doing for the static route
// It affects throughput by about -25%
co_await async_write(socket, make_response_message(req.keep_alive()));
#endif
if (!req.keep_alive())
break;
}
#else
// Since we're ignoring the requests, we might as well assume they're correct. (INSECURE)
for (beast::flat_buffer buf;;) {
auto [ec, n] = co_await async_read_until(socket, buf, "\r\n\r\n", as_tuple(net::deferred));
if (ec)
break;
buf.consume(n);
co_await async_write(socket, s_cooked_response);
}
#endif
} catch (beast::system_error const& e) {
std::osyncstream(std::cerr) << "handle_client_async error: " << e.code().message() << std::endl;
}
net::awaitable<void, executor_t> server(uint16_t port) {
auto ex = co_await net::this_coro::executor;
for (acceptor_t acceptor(ex, {{}, port});;)
co_spawn(ex, //
handle_client_async(co_await acceptor.async_accept()), //
net::detached);
}
int main() try {
// Create a thread pool
net::thread_pool pool(4);
executor_t ex = pool.get_executor();
// Create and bind the acceptor
co_spawn(ex, server(8070), net::detached);
std::cout << "Server listening on port 8070..." << std::endl;
pool.join();
} catch (std::exception const& e) {
std::cerr << "Main error: " << e.what() << std::endl;
}
On my system:
no defines, fast route:
TRUE_HTTP
and CACHED_RESPONSE
:
TRUE_HTTP
only:
As you can see, I focused on throughput numbers (as with fixed response size it's colinear with req/s). As such, TCP_NODELAY
has no tangible effect. You may want to play around to see what the latencies are like. You can see my work here: https://github.com/zigzap/zap/pull/110
_Enabling
BOOST_ASIO_HAS_IO_URING
andBOOST_ASIO_DISABLE_EPOLL
did not result in improvements._
However, if you create one io_context per core and use individual acceptors (using
SO_REUSEPORT
option)
Thanks @ashtum . I'll try that and comapare to c# again.
Thanks @sehe , I'll try your GREAT work and compare to other languages.
Version of Beast
347
Steps necessary to reproduce the problem
sudo snap install zig --beta --classic
zig
version and it should be 0.12 nowgit clone https://github.com/zigzap/zap.git
cd zap
git checkout -b perf tags/v0.7.0
wrk/csharp/Program.cs
, change line 6 toapp.MapGet("/", () => "Hello from C#1234");
, makes response body size (17 bytes ) the same between c# and c++ test code (little impact on result)../wrk/measure_all.sh "csharp cpp-beast"
for more details, see https://github.com/zigzap/zap/blob/master/blazingly-fast.md#how
All relevant compiler information
Result
The perf test result: https://github.com/zigzap/zap/blob/master/blazingly-fast.md#the-computer-makes-the-difference![result](https://raw.githubusercontent.com/zigzap/zap/master/wrk/samples/req_per_sec_graph.png)
Beast's test code
https://github.com/zigzap/zap/blob/master/wrk/cpp/main.cpp
I like beast, it's hard to believe beast's performance is so poor, can anyone point out what's wrong or improve beast's test code? I tried to modify to beast's http server example to return a string-body response for the perf, but the result got worse. 😭