STEllAR-GROUP / hpx

The C++ Standard Library for Parallelism and Concurrency
https://hpx.stellar-group.org
Boost Software License 1.0
2.54k stars 438 forks source link

Using the parcelport TCP causes the memory usage to continuously increase, and I'm not sure if it is a memory leak #6574

Open phil-skillwon opened 3 hours ago

phil-skillwon commented 3 hours ago

Recently, I’ve been validating the feasibility of using HPX as the foundational framework for our team's signal processing algorithm development. However, during testing, I noticed what seems to be a memory leak issue with HPX.

So, I wrote a separate test program using the TCP parcelport to test data interaction across multiple nodes and discovered what looked like a memory leak. But I’m not entirely sure, so I’m seeking your help here.

I used two nodes, running on two different hosts (Ubuntu 22.04 LTS), and the test code is as follows:

static vector<std::byte> getData(const size_t sz) 
{
    vector<std::byte> data(sz, (std::byte)(0xFF));

    return data;
}

HPX_PLAIN_ACTION(getData, GetDataAction);

int hpx_main(int argc, char* argv[])
{
    hpx::error_code ec = hpx::make_success_code();
    std::vector<hpx::id_type> localities = hpx::find_all_localities(ec);
    if (hpx::error::success != ec.value()) 
    {
        printf("find_all_localities executed failed, %s\n", ec.get_message().c_str());
        return -1;
    }

    if (localities.size() < 2) 
    {
        printf("this program requires at least 2 localities\n");
        return -2;
    }

    printf("num of localities: %ld\n", localities.size());
    for (const auto& loc : localities) 
    {
        hpx::naming::gid_type gid = loc.get_gid();
        std::string address = hpx::get_locality_name(loc).get();
        std::uint32_t localityId = hpx::naming::get_locality_id_from_gid(gid);

        printf("locality id: %d\n", localityId);
        printf("locality name: %s, id: %08X\n", address.c_str(), localityId);
    }

    size_t dataSize = 960256;
    while (true) 
    {
        hpx::this_thread::sleep_for(1400us);

        auto dataNode0 = hpx::sync<GetDataAction>(localities[0], dataSize);
        auto dataNode1 = hpx::sync<GetDataAction>(localities[1], dataSize);

        printf("node0, data size: %ld, node1, data size: %ld\n", dataNode0.size(), dataNode1.size());
    }

    return hpx::finalize();
}

Node 0 is the root node. I observed the memory usage on both Node 0 and Node 1. Both hosts have 8GB of physical memory.

When the test program started, the memory usage on both nodes was about 0.4%. But after 1 hour, the memory usage on Node 0 increased to 0.7%, while the memory usage on Node 1 remained at 0.4%.

After about 24 hours, the memory usage on Node 0 reached 1.9%, while the memory usage on Node 1 remained at 0.4%. This looks like a memory leak.

Later, I modified the test code to run a single process on one host, and there was no increase in memory usage.

However, my test code is extremely simple, so it’s unlikely that the issue is due to my code. Could you help analyze this problem?

hkaiser commented 2 hours ago

@phil-skillwon Could you compile your test code (and possibly HPX) with -DHPX_WITH_SANITIZERS=On, please? This should report memory leaks, if any. I'd be more than happy to assist in diagnosing and fixing those.