danvratil / qcoro

C++ Coroutines for Qt
https://qcoro.dvratil.cz
MIT License
331 stars 53 forks source link

Memory leak using headless QCoro::Task #152

Open jmiturasolutions opened 1 year ago

jmiturasolutions commented 1 year ago

Headless coroutine causes memory leak. Minimal reproducible example:

#include <QCoreApplication>
#include <QTimer>

#include <QCoro/QCoroTask>
#include <QCoro/QCoroTimer>

QCoro::Task<void> someAsyncAction()
{
    QTimer timer;
    timer.start(1);
    co_await timer;
}

QCoro::Task<void> reloadValues()
{
    std::vector<int> vec;
    vec.resize(1e5); // allocate some memory owned by RAII in coroutine.
    co_await someAsyncAction(); // make sure it gets suspended
    co_return;
}

int main(int argc, char *argv[])
{
    QCoreApplication app(argc, argv);

    QTimer timer;
    constexpr std::chrono::milliseconds timeout = std::chrono::milliseconds(100);
    timer.setInterval(timeout.count());
    QObject::connect(&timer, &QTimer::timeout, []() {
        reloadValues();
    });
    timer.start();
    return app.exec();
}

I am running this on Raspbian Bullseye (RPi Zero W) with cross-compiled Qt, but it most likely is consistent across platforms. Here is how it grows: image

Edit: I am using qcoro version 0.8.0. Checked with reloadValues().then([](){}); and still the same issue.

danvratil commented 1 year ago

Can you please share Qt version and possibly configuration with which you built it?

jmiturasolutions commented 1 year ago

Sure. I was using cross-pi-gcc-10.2.0-0 with -fcoroutines -lpthread and Qt 5.15. Only Qt5::Core and QCoro5Core were linked into the minimal example.

danvratil commented 1 year ago

I've tried to reproduce this with Qt 5.15 on x86_64, but I couldn't reproduce any leak.

I tried with glibc's default allocator as well as jemalloc to try get more details regarding memory allocations during the testing, but in both cases the RSS remained stable.

I don't have any (free) RPi at hand where I could try this. I spent some time getting Qt cross-compiled for arm, but failed to get QCoro cross-compiled (do you have a CMake toolchain file at hand by any chance)?

If you are cross-compiling QCoro and the app yourself, could you please try to build QCoro with -DQCORO_ENABLE_ASAN=ON and the reproducer app with -fsanitize=address to get some hints from ASAN where the memory leak might be coming from? You may need to modify the reproducer to quit clearly after some time in order to get a full report from ASAN.

Kasurus commented 1 month ago

@danvratil i was searching for a potential memory leak in my own program that the sanitizer was complaining about when i came across this issue. It looks like i could reproduce the issue on Ubuntu 22.04 and 24.04. To me this was not reproducable on Windows.

I set up a repo with the example above slightly modified to exit after 10s: https://gitlab.com/Kasurus/qcoro-leak

In the CI console you can see the output of the sanitizer: https://gitlab.com/Kasurus/qcoro-leak/-/jobs/7430726464

I used Qt 6.7.2 and the current main branch of qcoro.

Let me know if there is something else i can do to help diagnose this issue.

danvratil commented 1 month ago

Thanks for the additional info. What's visible from the ASAN is that always only a single vector leaks (40'000 bytes, which matches the 10e5 * sizeof(int) bytes allocated for the vector.

This kind of memory leak happens when the coroutine is supended but is not resumed (so it cannot clean up its stack). The reason why the reloadValues() coroutine does not get resumed is because there is a race between the QTimers. What can happen is that while someAsyncAction() is suspended, the QTimer in main() fires and terminates the event loop by calling QCoreApplication::quit(). And without an event loop, there's no-one to trigger the timer from someAsyncAction(), wake it up etc.

The example can be simplified like this:

int main(int argc, char *argv[])
{
    QCoreApplication app(argc, argv);

    QTimer::singleShot(0, []() -> QCoro::Task<> {
        std::vector<int> vec;
        vec.resize(1e5);

        QTimer timer;
        timer.start(1000);
        co_await timer;
    });

    QTimer::singleShot(100, &app, [&app]() {
        app.quit();
    });
    return app.exec();
}

Notice that the timer in the lambda is 1 second, while we quit the app in 100ms. This reproduces the leak reliably. When using the repeated timers, unless you randomize them or do other stuff, there's a big chance that the timers will tick in the right order. However, the timers will eventually drift enough so that the "quit" timer fires while reloadValues() is suspended.

I don't think this is a QCoro bug, I think it's rather a programmer error -IMO it's equivalent to allocating a heap memory and scheduling a timer to free the memory after some time. If the event loop is terminated before that timer fires, the allocated memory will leak, since the timer will never trigger to free it, and it's a bug in the user program. The only difference is that in case of heap-allocated memory, the memory could be freed in some cleanup code in the application. Unfortunately it's not possible to do this with coroutines.

In any case, what I will do on QCoro side is printing a qWarning() when we detect that the event loop has been terminated while there are still some unfinished coroutines. Hopefully that might help users with debugging the memory leak.

Kasurus commented 1 month ago

Interesting. Your explanation was very helpful! I was able to find the leak in my own code. I was not properly finishing a coroutine in the cleanup code of a test. So the exact same issue. For me that was not very obvious from the output of the sanitizer in my code. So a qWarning() would have been helpful to find the issue more quickly.