boostorg / beast

HTTP and WebSocket built on Boost.Asio in C++11
http://www.boost.org/libs/beast
Boost Software License 1.0
4.25k stars 633 forks source link

Performance issue: Beast's request per second (RPS) is half of C#'s #2883

Open albertxavier100 opened 3 weeks ago

albertxavier100 commented 3 weeks ago

Version of Beast

347

Steps necessary to reproduce the problem

  1. Install dotnet https://learn.microsoft.com/en-us/dotnet/core/install/linux
  2. Install zig: sudo snap install zig --beta --classic
  3. check your zig version and it should be 0.12 now
  4. git clone https://github.com/zigzap/zap.git
  5. cd zap
  6. git checkout -b perf tags/v0.7.0
  7. In wrk/csharp/Program.cs, change line 6 to app.MapGet("/", () => "Hello from C#1234");, makes response body size (17 bytes ) the same between c# and c++ test code (little impact on result).
  8. ./wrk/measure_all.sh "csharp cpp-beast"
  9. the perf result displays in console, you can find the cpp-beast is really poor on perf compared to csharp

for more details, see https://github.com/zigzap/zap/blob/master/blazingly-fast.md#how

All relevant compiler information

Result

The perf test result: https://github.com/zigzap/zap/blob/master/blazingly-fast.md#the-computer-makes-the-difference result

Beast's test code

https://github.com/zigzap/zap/blob/master/wrk/cpp/main.cpp


I like beast, it's hard to believe beast's performance is so poor, can anyone point out what's wrong or improve beast's test code? I tried to modify to beast's http server example to return a string-body response for the perf, but the result got worse. 😭

ashtum commented 3 weeks ago

I'm mesuring approximately 300K requests per second using the http_server_async example and utilizing 4 threads (as stated in the provided link). However, if you create one io_context per core and use individual acceptors (using SO_REUSEPORT option), you can anticipate around 150K requests per second for each additional core you put into your setup (as they work independently).

Regardless, I doubt this benchmark accurately reflects the usage of these libraries in real-world scenarios:

So, what exactly are these benchmarks testing? Simple HTTP servers that respond to GET requests with a constant, 17-byte-long message.

sehe commented 3 weeks ago

This is comparing apples to pears.

#include <boost/asio.hpp>
#include <boost/asio/experimental/awaitable_operators.hpp>
#include <boost/beast.hpp>
#include <boost/lexical_cast.hpp>
#include <fstream>
#include <iostream>
#include <syncstream>

namespace beast = boost::beast;
namespace http  = beast::http;
namespace net   = boost::asio;
using tcp     = net::ip::tcp;
using namespace net::experimental::awaitable_operators;
using executor_t = net::thread_pool::executor_type;
using acceptor_t = net::deferred_t::as_default_on_t<net::basic_socket_acceptor<tcp, executor_t>>;
using socket_t   = net::deferred_t::as_default_on_t<net::basic_stream_socket<tcp, executor_t>>;

[[maybe_unused]] static std::string read_html_file(std::string const& file_path) {
    std::ifstream file(file_path, std::ios::binary);
    return {std::istreambuf_iterator<char>(file), {}};
}

static auto const make_response_message = [](bool keep_alive) {
    // Construct an HTTP response with the HTML content
    std::string_view msg = "Hello from C++!!!"; // or read_html_file("hello.html");

    http::response<http::span_body<char const>> res{http::status::ok, 11, msg};
    res.set(http::field::server, "C++ Server");
    res.set(http::field::content_type, "text/html");
    res.keep_alive(keep_alive);
    res.prepare_payload();
    return res;
};

static auto const s_cooked_response = [] {
    static auto const text = boost::lexical_cast<std::string>(make_response_message(true));
    return net::buffer(text);
}();

net::awaitable<void, executor_t> handle_client_async(socket_t socket) try {
    socket.set_option(tcp::no_delay(true)); // no difference observed in benchmark

#ifdef TRUE_HTTP // This affects throughput by only about -10%
    beast::flat_buffer buf;
    for (http::request<http::empty_body> req;; req.clear()) {
        auto [ec, _] = co_await async_read(socket, buf, req, as_tuple(net::deferred));
        if (ec)
            break;
    #ifdef CACHED_RESPONSE
        // emulate caching server
        co_await async_write(socket, s_cooked_response);
    #else
        // This is a more realistic way but probably NOT what Kestrel is doing for the static route
        // It affects throughput by about -25%
        co_await async_write(socket, make_response_message(req.keep_alive()));
    #endif
        if (!req.keep_alive())
            break;
    }
#else
    // Since we're ignoring the requests, we might as well assume they're correct. (INSECURE)
    for (beast::flat_buffer buf;;) {
        auto [ec, n] = co_await async_read_until(socket, buf, "\r\n\r\n", as_tuple(net::deferred));
        if (ec)
            break;
        buf.consume(n);
        co_await async_write(socket, s_cooked_response);
    }
#endif
} catch (beast::system_error const& e) {
    std::osyncstream(std::cerr) << "handle_client_async error: " << e.code().message() << std::endl;
}

net::awaitable<void, executor_t> server(uint16_t port) {
    auto ex = co_await net::this_coro::executor;

    for (acceptor_t acceptor(ex, {{}, port});;)
        co_spawn(ex,                                                    //
                 handle_client_async(co_await acceptor.async_accept()), //
                 net::detached);
}

int main() try {
    // Create a thread pool
    net::thread_pool pool(4);
    executor_t ex = pool.get_executor();

    // Create and bind the acceptor
    co_spawn(ex, server(8070), net::detached);

    std::cout << "Server listening on port 8070..." << std::endl;

    pool.join();
} catch (std::exception const& e) {
    std::cerr << "Main error: " << e.what() << std::endl;
}

On my system:

As you can see, I focused on throughput numbers (as with fixed response size it's colinear with req/s). As such, TCP_NODELAY has no tangible effect. You may want to play around to see what the latencies are like. You can see my work here: https://github.com/zigzap/zap/pull/110

_Enabling BOOST_ASIO_HAS_IO_URING and BOOST_ASIO_DISABLE_EPOLL did not result in improvements._

albertxavier100 commented 3 weeks ago

However, if you create one io_context per core and use individual acceptors (using SO_REUSEPORT option)

Thanks @ashtum . I'll try that and comapare to c# again.

albertxavier100 commented 3 weeks ago

Thanks @sehe , I'll try your GREAT work and compare to other languages.