stress test prints "Cannot allocate memory"

mikehb commented 10 months ago

homa.tar.gz

Comment the sleep_for statement in homa_test.cpp

#include <chrono>
#include <string.h>
#include <libgen.h>
#include <gflags/gflags.h>

#include "grpc_rtk.h"

DEFINE_bool(homa, false, "use HOMA instead of HTTP2 for grpc transport");

int main(int argc, char *argv[]) {
    gflags::ParseCommandLineFlags(&argc, &argv, true);
    if (strcmp(basename(argv[0]), "client") == 0) {
        GrpcRtkClient *rtk_client;
        if (argc > 1) {
            rtk_client = new GrpcRtkClient(argv[1]);
        } else {
            rtk_client = new GrpcRtkClient("127.0.0.1:4000");
        }
        rtk_client->Start();
        rtk_client->WatchTextMessage();
        while (true) {
            std::this_thread::sleep_for(std::chrono::seconds(5));
        }
    } else {
        auto rtk_service = new GrpcRtkService();
        if (argc > 1) {
            std::thread(&GrpcRtkService::Run, rtk_service, argv[1]).detach();
        } else {
            std::thread(&GrpcRtkService::Run, rtk_service, "127.0.0.1:4000").detach();
        }
        int count = 0;
        TextMessage text_message;
        char buf[1000];
        memset(buf, 0x5A, sizeof(buf)-1);
        buf[sizeof(buf)-1] = '\0';
        text_message.set_msg(buf);
        while (true) {
            text_message.set_msg(std::to_string(count++));
            rtk_service->PublishTextMessage(text_message);
            //std::this_thread::sleep_for(std::chrono::milliseconds(1));
        }
    }
}

[mikehuang@bogon homa]$ ./bazel-bin/server  -homa
register text message client 127.0.0.1:32776
E1016 14:20:18.209892639    5684 homa_stream.cc:201]                   Couldn't send Homa request: Cannot allocate memory
E1016 14:20:18.210584533    5684 homa_stream.cc:201]                   Couldn't send Homa request: Cannot allocate memory
E1016 14:20:18.211588648    5684 homa_stream.cc:201]                   Couldn't send Homa request: Cannot allocate memory
E1016 14:20:18.213500457    5684 homa_stream.cc:201]                   Couldn't send Homa request: Cannot allocate memory
E1016 14:20:18.214512034    5684 homa_stream.cc:201]                   Couldn't send Homa request: Cannot allocate memory

[mikehuang@bogon homa]$ ./bazel-bin/client -homa
E1016 14:20:18.209865928    5727 homa_stream.cc:454]                   Couldn't send dummy Homa response: Cannot allocate memory
E1016 14:20:18.210101982    5727 homa_stream.cc:454]                   Couldn't send dummy Homa response: Cannot allocate memory
E1016 14:20:18.210622059    5727 homa_stream.cc:454]                   Couldn't send dummy Homa response: Cannot allocate memory
E1016 14:20:18.211993914    5727 homa_stream.cc:454]                   Couldn't send dummy Homa response: Cannot allocate memory
E1016 14:20:18.213110114    5727 homa_stream.cc:454]                   Couldn't send dummy Homa response: Cannot allocate memory
E1016 14:20:18.213391033    5727 homa_stream.cc:454]                   Couldn't send dummy Homa response: Cannot allocate memory
E1016 14:20:18.213857290    5727 homa_stream.cc:454]                   Couldn't send dummy Homa response: Cannot allocate memory
E1016 14:20:18.214898479    5727 homa_stream.cc:454]                   Couldn't send dummy Homa response: Cannot allocate memory

mikehb commented 10 months ago

It works fine if the sleep_for was added

johnousterhout commented 10 months ago

I am not seeing memory allocation failures when I run the test with sleep_for commented out, but I am seeing strange behavior. Things seem to work until I terminate the client program, but at that point the server seems to go into an infinite loop sending requests but never actually attempting to read responses. Thus, it doesn't detect that the client has ended so it sends yet more requests. Could commenting out the sleep_for result in this infinite sending behavior? Is there any limit on the number of outstanding requests that can exist on the server at once? If this number gets high enough, it's possible that kernel resources will get exhausted, which could explain the memory allocations failures you are seeing (the errors in your logs are coming from the kernel, not from grpc_homa).

-John-

On Sun, Oct 15, 2023 at 11:34 PM mikehb @.***> wrote:

It works fine if the sleep_for was added

— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/grpc_homa/issues/14#issuecomment-1763818131, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCWJXV2VOCE7WUDGK6TX7TIQ5AVCNFSM6AAAAAA6BU47WOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRTHAYTQMJTGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mikehb commented 10 months ago

I just write this test program to see if there is any issue with homa and grpc_homa. There won't be infinite loop of sending requests in my application, but a bust of sending requests may happen in the real world. My linux host get stuck when the infinite loop exists and I have to reboot my linux host. This issue didn't happen when I switched to TCP.

johnousterhout commented 10 months ago

There are a few places where poor application behavior can cause Homa to exhaust kernel resources. This is on my list of things to fix, eventually.

-John-

PlatformLab / grpc_homa

stress test prints "Cannot allocate memory" #14