Performance Issue / Configuration Question

kujukuju commented 1 year ago

Hi,

I'm attempting to use libmicrohttpd for a production server for data processing for a website.

I was originally using mongoose, but scared about high-load related issues I've heard about with mongoose I switched to libmicrohttpd.

However, I'm experiencing some weird latency issues. With MHD I'm seeing consistently ~500 ms of latency to localhost using postman. Using just my browser, I'm seeing each page request very consistently alternative between ~8 ms and ~300 ms. It switches back and forth every single page refresh. However I'll restate that postman is seeing consistently ~500 ms without fail.

I went back and checked out my mongoose commit, and I'm seeing consistently ~8 ms of latency in both postman and browser.

So this ruled out what I thought might be an issue with windows defender or firewall.

I assumed the server might be doing different stuff with regards to waking up on messages. Windows is notoriously bad at that. So I set up my project in windows subsystem for linux, and I see the exact same issue and latencies reproduced.

So it's not a windows event based OS issue.

To run my code I'm actually writing this code in Jai, and I generated DLL/LIB files (using O3 optimizations) and am loading them into my Jai library. (This is also what I'm doing for mongoose.)

Is this a known thing with libmicrohttpd? Or do I have some configuration settings wrong?

The code is split between a few files but here's the important parts:

    server := New(HttpServer);
    server.port = port;
    server.timeout_seconds = timeout_seconds;
    server.thread_count = thread_count;

    // thread_count is ~48 (twice my cores)
    // timeout_seconds is 60

    server.daemon = MHD_start_daemon(
        cast(u32) (MHD_FLAG.MHD_USE_INTERNAL_POLLING_THREAD | .USE_AUTO),
        server.port,
        null,
        null,
        server_respond,
        cast(*void) server,
        MHD_OPTION.MHD_OPTION_THREAD_POOL_SIZE, server.thread_count,
        MHD_OPTION.MHD_OPTION_CONNECTION_TIMEOUT, server.timeout_seconds,
        MHD_OPTION.MHD_OPTION_END);

    print("Listening on port %...\n", server.port);

    // pause indefinitely
    semaphore: Semaphore;
    init(*semaphore);
    wait_for(*semaphore);

    server_respond :: (cls: *void, connection: *MHD_Connection, url: *u8, method: *u8, version: *u8, upload_data: *u8, upload_data_size: *u64, req_cls: **void) -> MHD_Result #c_call {
        request_time := get_unix_time();
        // ... print request_time just for debugging
        // request_time is just about ~290 milliseconds after the browser sends the request
        // the browser also tracks the roundtrip time at about ~300 milliseconds
    }

I've tried going back to INTERNAL_THREAD and just doing a basic while true { MHD_run(server.daemon); } loop, but this reproduced the exact same issue.

I've also tried MHD_run_timeout, still with the same issue. In both INTERNAL mode and INTERNAL_POLLING_THREAD mode.

Looking through the examples, most examples seem to be equally as simple as my code here, just relying on a command line input to pause the server instead of a semaphore.

Any advice for what to do at this point? I'm at a complete loss here of how to improve performance. I can only imagine it's a configuration issue.

kujukuju commented 1 year ago

Also I'd like to add that I've checked the system time at the start of the MHD_AcceptPolicyCallback function and the MHD_AccessHandlerCallback function. Both of these happen at almost the exact same time, and both of these fire off about ~290 ms after the webpage has made its request. So it seems to be something internal that's causing the latency.

kujukuju commented 1 year ago

Also adding again that I found that mongoose actually has a ~300 ms web round trip, and ~500 ms postman roundtrip when it's queried for the first time in a while.

Since these timings match this "cold start" issue seems related. Any idea what might be happening with the inner workings of the server to cause this?

Karlson2k commented 1 year ago

Just a few days ago another user reported that performance of MHD is lower than some other lib. However, according to his measurement the average response time is ~9ms. See https://lists.gnu.org/archive/html/libmicrohttpd/2023-06/msg00005.html

Please provide additional information:

What's your OS?
What's the MHD version? Any particular release or git master?
What's your server_respond function?

Please clarify what is the INTERNAL mode you mentioned? MHD can ran its own thread with MHD_USE_INTERNAL_POLLING_THREAD flag or use external polling mode. MHD_run() can be used only with external polling (i.e. without MHD_USE_INTERNAL_POLLING_THREAD flag).

Why did you use the double amount of number of your cores for thread pool size? Your CPU cannot execute more threads then cores. When you use thread pool size equal to number of your CPU core (or lower) each thread is running on its own core. When you use number higher that available CPU cores then OS periodically stops some threads to run other threads and then stops other threads to return execution to previous threads. It is very non optimal. Try to use numbers lower than number of your CPU cores or even without thread pool at all.

MHD_AcceptPolicyCallback is called by MHD when new connection is accepted before any other processing, so this may indicate that connection is delayed by the OS. I've checked W32 build with src/exmples/minimal_example and the browser. The browser reports 307-315 ms delay when connection is not yet established, while for following requests over already open connection it is 0 or 1 ms only. When running the same browser on W32, while running MHD on another machine with Debian, the delay for the first request is 8 ms, for the next requests it is 1 ms.

For me it looks like Windows delays open of the connection.

Karlson2k commented 1 year ago

I've dug deeper and found out:

There is no delay when using libcurl and MHD on Windows (localhost). Replies are always less than 15 ms.
There is delay when using Chrome on Windows and libmicrohttpd on Windows (localhost). The first request is always delayed ~300 ms, the following requests take 1 ms
There is no delay when using Chrome on Windows and libmicrohttpd on Debian. First request is ~8 ms.
There is no delay when using Firefox on Debian and libmicrohhtpd on Windows. First request is ~1-2 ms.
There is no delay when using Firefox on Windows and libmicrohttpd on Windows (localhost). First request is 0-2 ms.

It looks like the problem is with Chrome on Windows when it connects to localhost. As far as I know Postman is based on Chrome engine, so it suffers from the same problem.

Chrome is slow when opening new connection on localhost. If connection is kept alive the next requests are fast. As soon as connection is closed the next request is delayed again.

You could check open connection on the system,

Fifefox, libcurl are free from this problem, on Windows and Linux whether they connect to localhost or remote.

kujukuju commented 1 year ago

Thanks for your detailed response!

I'm on windows. I just built git master about 2-3 days ago.

Here's the entirety of the server_respond function:

server_respond :: (cls: *void, connection: *MHD_Connection, url: *u8, method: *u8, version: *u8, upload_data: *u8, upload_data_size: *u64, req_cls: **void) -> MHD_Result #c_call {
    request_time := get_unix_time();
    server := cast(*HttpServer) cls;
    thread_context := get_thread_context(server);

    push_context thread_context {
        url_string := normalize_url(to_string(url));
        method_string := to_string(method);

        http_request := HttpRequest.{};
        http_response := HttpResponse.{
            server,
            connection,
            server.extra_headers,
        };

        // need to remove all duplicate slashes and trailing slashes, and maybe store each / part as its own thing so I can match wildcards

        http_method := get_http_method(method_string);
        handler := table_find_pointer(*server.handlers[cast(u16) http_method], url_string);
        if !handler {
            handler = table_find_pointer(*server.handlers[cast(u16) http_method], "*");
        }
        if !handler {
            handler = table_find_pointer(*server.handlers[cast(u16) HttpMethod.Fallback], url_string);
        }
        if !handler {
            handler = table_find_pointer(*server.handlers[cast(u16) HttpMethod.Fallback], "*");
        }
        if !handler {
            print("No valid fallback found. Begrudgingly returning 404.\n");
            handler = *default_not_found_handler;
        }

        http_result := handler.*(*http_request, *http_response);

        reset_temporary_storage();

        return http_result.result;
    }
}

get_thread_context is a function that uses the os-specific thread ID to lookup a Context struct, which is a jai specific thing that basically manages your stack settings and your heap/temporary memory allocators.

By INTERNAL mode I actually meant EXTERNAL, the one without a thread pool that requires you to call MHD_run. I thought a costly continuous loop might fix whatever callback latency issue but that evidently wasn't the cause.

Why did you use the double amount of number of your cores for thread pool size? There's no real reason. I arbitrarily picked a number. My thought process was to have available backups for the OS to swap to if some thread hangs for some reason. It would make sense that one thread per core is more optimal.

I was basically able to reproduce most of your findings. Firefox has a consistent ~50 ms round trip. It's higher than I think it should be but it's not totally unreasonable like 300 ms. I'm still getting the 300 ms -> 8 ms swap when using WSL ubuntu with chrome, but maybe that's because it's just a subsystem. I wouldn't be surprised if it's still going through windows somehow. I got and set up an ubuntu VPS thats in San Fran, while I'm in Seattle (closest I could easily find.) The round trip time on chrome was a consistent ~120 ms, while the round trip time on firefox was a consistent ~80 ms. Both of these seem reasonable, I think.

It seems like most of all the issues can be explained by chrome being bad. The remaining concerning issue then is that for whatever reason mongoose embedded server only reproduces this 300 ms lag running on windows, loading on chrome, for the first request made after being cold for ~5-10 minutes. Do you know if there are any inner workings or windows related things that could be cold starting every other request? It might not be worth it to even bother with this issue since windows is a B tier OS for server software.

Karlson2k commented 8 months ago

The issue seems to be on the client side on Windows. I don't know the precise reason for this, is the the Windows network stack implementation or browser problem (both Chrome and Firefox were ported to Windows from other platforms).

Karlson2k / libmicrohttpd

Performance Issue / Configuration Question #20