arvidn / libtorrent

an efficient feature complete C++ bittorrent implementation
http://libtorrent.org
Other
5.26k stars 997 forks source link

memory leak when calling handle.torrent_file() from read_piece_alert #7273

Open Lwxiang opened 1 year ago

Lwxiang commented 1 year ago

libtorrent version (or branch): 1.2.18

platform/architecture: x86_64 centos7.2

compiler and compiler version: gcc7.3.1

I add big file into lt session, and the file checking will last for a while. During the file checking, if I call torrent_info() from read_piece_alert's handle, there will be some memory leak.

With the following code can easily reproduce the case:

#include <functional>
#include <future>
#include <iostream>
#include <memory>

// lib torrent
#include "libtorrent/alert_types.hpp"
#include "libtorrent/create_torrent.hpp"
#include "libtorrent/session.hpp"
#include "libtorrent/settings_pack.hpp"
#include "libtorrent/socket.hpp"
#include "libtorrent/torrent_flags.hpp"
#include "libtorrent/torrent_handle.hpp"
#include "libtorrent/torrent_info.hpp"
#include "libtorrent/torrent_status.hpp"

const lt::alert_category_t alertMask = lt::alert_category::status |
                                       lt::alert_category::file_progress |
                                       lt::alert_category::piece_progress |
                                       lt::alert_category::incoming_request |
                                       lt::alert_category::performance_warning |
                                       lt::alert_category::tracker |
                                       lt::alert_category::connect |
                                       lt::alert_category::peer;

void alert_handler(lt::session* s)
{
    while (true)
    {
        std::vector<lt::alert*> alerts;
        s->pop_alerts(&alerts);

        for (const auto alert : alerts)
        {
            switch (alert->type())
            {
            case lt::add_torrent_alert::alert_type:
            {
                std::cout << "torrent added" << std::endl;
                break;
            }
            case lt::read_piece_alert::alert_type:
            {
                lt::alert_cast<lt::read_piece_alert>(alert)->handle.torrent_file();
                break;
            }
            case lt::piece_finished_alert::alert_type:
            {
                lt::piece_finished_alert* a = lt::alert_cast<lt::piece_finished_alert>(alert);
                a->handle.read_piece(a->piece_index);
                break;
            }
            default:
                break;
            }
        }

        usleep(100000);
    }
}

int main()
{
    lt::settings_pack sp;
    sp.set_str(lt::settings_pack::listen_interfaces, "127.0.0.1:60020");
    sp.set_int(lt::settings_pack::alert_mask, alertMask);
    sp.set_int(lt::settings_pack::aio_threads, 16);
    sp.set_int(lt::settings_pack::alert_queue_size, 10000);

    auto s = lt::session(sp);
    sleep(1);

    auto f = std::async(std::launch::async, std::bind(&alert_handler, &s));
    sleep(1);

    lt::file_storage fs;
    lt::add_files(fs, "./testfile");
    lt::create_torrent torrent(fs);
    lt::set_piece_hashes(torrent, lt::parent_path("."));

    std::vector<char> tmpBuf;
    lt::bencode(std::back_inserter(tmpBuf), torrent.generate());

    lt::add_torrent_params p;
    p.ti = std::make_shared<lt::torrent_info>(tmpBuf.data(), tmpBuf.size());
    p.save_path = ".";
    p.flags = lt::torrent_flags::update_subscribe |
              lt::torrent_flags::need_save_resume;
    auto handle = s.add_torrent(p);
    sleep(1);

    // wait for the user to end
    char a;
    int ret = std::scanf("%c\n", &a);
    (void)ret; // ignore
    return 0;
}

This code is just:

  1. create a torrent from ./testfile, add it into session.
  2. when piece_finish_alert occurs, do read_piece()
  3. when read_piece_alert occurs, get torrent_info() from handle.

I will create a big file as testfile, such as:

truncate -s 50G testfile

then just run the test code, the memory leaks occur. But after file checking, the memory not leak any more when I keep calling read_piece().

image
arvidn commented 1 year ago

Do you know which object is leaking?

Lwxiang commented 1 year ago

I haven't found the leaking object, besides torrent_file(), call handle.status() from read_piece_alert also causes leaking.

glassez commented 1 year ago

@Lwxiang How do you interpret "memory leak"?

I haven't found the leaking object, besides torrent_file(), call handle.status() from read_piece_alert also causes leaking.

So each call to torrent_file() within the loop allocates some additional memory and never frees it? So if you (for example) will purposely skip this call at some iterations it allocates less memory?

Lwxiang commented 1 year ago

@glassez

So each call to torrent_file() within the loop allocates some additional memory and never frees it?

yes

So if you (for example) will purposely skip this call at some iterations it allocates less memory?

With the test code above, if I add a 50GB file, which has 12800 pieces, 4096KB per piece, then call torrent_file() about 2000 times(imprecisely) cause the memory usage at about 20.4GB RES(after all piece is finished and there are not read alerts any more and the memory never be released)

image

while if I call torrent_file() about 8000 times cause the memory usage at 37.1GB RES

image

the calling times is not very precise, but you can see more torrent_file() I call, more memories it will remain after all actions.

glassez commented 1 year ago

you can see more torrent_file() I call, more memories it will remain after all actions.

It looks strange, considering that torrent_handle::torrent_file() isn't supposed to make copy of the data, but only returns a shared pointer to existing one.

glassez commented 1 year ago

With the test code above, if I add a 50GB file, which has 12800 pieces, 4096KB per piece, then call torrent_file() about 2000 times(imprecisely) cause the memory usage at about 20.4GB RES

Could you provide a modified code (with limiting calls to torrent_file()) that you used to test it?

Lwxiang commented 1 year ago

Could you provide a modified code (with limiting calls to torrent_file()) that you used to test it?

@glassez the limit number can be changed by max_count

#include <atomic>
#include <functional>
#include <future>
#include <iostream>
#include <memory>
#include <mutex>

// lib torrent
#include "libtorrent/alert_types.hpp"
#include "libtorrent/create_torrent.hpp"
#include "libtorrent/session.hpp"
#include "libtorrent/settings_pack.hpp"
#include "libtorrent/socket.hpp"
#include "libtorrent/torrent_flags.hpp"
#include "libtorrent/torrent_handle.hpp"
#include "libtorrent/torrent_info.hpp"
#include "libtorrent/torrent_status.hpp"

const lt::alert_category_t alertMask = lt::alert_category::status |
                                       lt::alert_category::file_progress |
                                       lt::alert_category::piece_progress |
                                       lt::alert_category::incoming_request |
                                       lt::alert_category::performance_warning |
                                       lt::alert_category::tracker |
                                       lt::alert_category::connect |
                                       lt::alert_category::peer;

int count = 0;
int max_count = 2000;
std::mutex mutex;

void alert_handler(lt::session* s)
{
    while (true)
    {
        std::vector<lt::alert*> alerts;
        s->pop_alerts(&alerts);

        for (const auto alert : alerts)
        {
            switch (alert->type())
            {
            case lt::add_torrent_alert::alert_type:
            {
                std::cout << "torrent added" << std::endl;
                break;
            }
            case lt::read_piece_alert::alert_type:
            {
                mutex.lock();
                int current = ++count;
                mutex.unlock();

                if (current >= max_count)
                {
                    break;
                }

                lt::alert_cast<lt::read_piece_alert>(alert)->handle.torrent_file();
                break;
            }
            case lt::piece_finished_alert::alert_type:
            {
                lt::piece_finished_alert* a = lt::alert_cast<lt::piece_finished_alert>(alert);
                a->handle.read_piece(a->piece_index);
                break;
            }
            default:
                break;
            }
        }

        usleep(100000);
    }
}

int main()
{
    lt::settings_pack sp;
    sp.set_str(lt::settings_pack::listen_interfaces, "127.0.0.1:60020");
    sp.set_int(lt::settings_pack::alert_mask, alertMask);
    sp.set_int(lt::settings_pack::aio_threads, 16);
    sp.set_int(lt::settings_pack::alert_queue_size, 10000);

    auto s = lt::session(sp);
    sleep(1);

    auto f = std::async(std::launch::async, std::bind(&alert_handler, &s));
    sleep(1);

    lt::file_storage fs;
    lt::add_files(fs, "./testfile");
    lt::create_torrent torrent(fs);
    lt::set_piece_hashes(torrent, lt::parent_path("."));

    std::vector<char> tmpBuf;
    lt::bencode(std::back_inserter(tmpBuf), torrent.generate());

    lt::add_torrent_params p;
    p.ti = std::make_shared<lt::torrent_info>(tmpBuf.data(), tmpBuf.size());
    p.save_path = ".";
    p.flags = lt::torrent_flags::update_subscribe |
              lt::torrent_flags::need_save_resume;
    auto handle = s.add_torrent(p);
    sleep(1);

    // wait for the user to end
    char a;
    int ret = std::scanf("%c\n", &a);
    (void)ret; // ignore
    return 0;
}
Lwxiang commented 1 year ago

I tested this a few more times and found that the amount of memory leaked does not necessarily seem to be positively correlated with the number of calls. Maybe there are something wrong in my test code. But it is true that there is some memory that cannot be released after calling.

glassez commented 1 year ago

Maybe it's read_piece() that causes the problem?..

arvidn commented 1 year ago

I strongly suspect that the memory being used is because of the (unbounded) number of calls to read_piece(). That call will cause libtorrent to allocate the size of one piece on the heap, copy that piece data into it and pass it back to you in an alert. Since you don't limit the number of simultaneous read_piece_alerts in the queue (i.e. it's unbounded) it's all a race to destruct those alert objects and free the piece memory. The faster you can free the alerts, the fewer of them will exist simultaneously.

i.e. you could most likely put any code there (e.g. sleeping for 1ms) and you'll see the memory usage increase.

glassez commented 1 year ago

Since you don't limit the number of simultaneous read_piece_alerts in the queue (i.e. it's unbounded) it's all a race to destruct those alert objects and free the piece memory. The faster you can free the alerts, the fewer of them will exist simultaneously.

But doesn't @Lwxiang claim that the allocated memory remains unallocated even after all read_piece_alerts are processed (and supposed to be freed)? Or did I misinterpret it?

Lwxiang commented 1 year ago

But doesn't @Lwxiang claim that the allocated memory remains unallocated even after all read_piece_alerts are processed (and supposed to be freed)? Or did I misinterpret it?

yes, if the memories are just allocated and will be freed later, it is fine. But after all read_piece_alerts processed, the memories remain and never be freed.

Lwxiang commented 1 year ago

image

This is not the memory usage during processing, it is after all read_piece_alerts processed, and wait about half an hour.

I am pretty sure that the memory leaks, but I don't know why.

arvidn commented 1 year ago

can you build with address sanitizer?

Lwxiang commented 1 year ago

can you build with address sanitizer?

I have tried, no leak error shows

Lwxiang commented 1 year ago

i.e. you could most likely put any code there (e.g. sleeping for 1ms) and you'll see the memory usage increase.

YES, I have tried as you said.

The point is not torrent_file() or status(), the point is the processed time. the longer processed time it takes, the more memory will exist simultaneously. So if I just usleep(10000) can also reproduce the case.

So maybe the unbound number of read_piece_alerts sending into queue prevent the old alerts from releasing?

Lwxiang commented 1 year ago

If I call torrent_file() or other high-cost procedures in read_piece_alerts handler, it may cause that the current generation alerts have a longer life(before clean) than before. Which also causes the next generation alerts have more read_alerts(same speed of incoming alerts), then the next generation have more alive time.

as the following pic:

image

The memory usage increase higher and higher during processing because the alerts have a longer and longer life time. But notice the last generation, only takes 2 seconds to process 5k, this because the file checking finished and torrent status changes to seeding.

so the questions become to followings:

  1. why file checking will slow down the process speed of torrent_file() and other procedures?
  2. why the high memory still remain after all is done?
glassez commented 1 year ago
  1. why file checking will slow down the process speed of torrent_file() and other procedures?

6996

libtorrent doesn't use mutex-based approach to obtain some data from the working thread but it queues "requests" to be invoked in that thread so you need waiting until all current jobs are done.

Lwxiang commented 1 year ago

For now, I just forbid the read-piece actions when torrent is under checking-files state to prevent the case.

arvidn commented 1 year ago

@Lwxiang it seems to me that the problem with unbounded memory usage is fundamentally caused by you not having a limit on the number of pieces you keep in RAM at a time.

Is there a reason you cannot have a cap on that? It doesn't seem very robust to assume that you can handle alerts at a high enough rate to stay within some reasonable memory usage. It might work under the right circumstances on your computer, but perhaps not on other's computers.

Lwxiang commented 1 year ago

I see, the more number of pieces I keep in RAM at a time, the more memory usage it takes at a time, that is right. What I don't understand is why, after all read_pieces_alert is cleaned, the memory usage is still high, and never come down. (I don't keep any data of the piece buffer)

What I suppose to see is, the memory usage goes up, maybe very high, but finally it will fall down after all actions done.

image

But what I actually see is, if I call multiple read_piece during checking-files state, the memory is like following, even after I remove the torrent.

image
arvidn commented 1 year ago

understanding that requires to first understand what you mean when you say "memory usage". There are at least 3 levels involved:

  1. The program calling malloc() and free()
  2. The runtime library implementing malloc() and free()
  3. The kernel allocating and deallocating address space for the process

I expect that you look at resident pages, which is in the realm of (3) whereas libtorrent is in (1). Perhaps the run-time library is predicting that the process may have another spike in memory allocations any moment, and defers returning parts of its address space to the kernel. Or perhaps it ended up being fragmented making it difficult for the runtime library to return it.

In order to know whose behavior you need to understand, it would make sense to also profile the memory usage at the malloc() and free() level. With a heap profiler for instance.

arvidn commented 1 year ago

note that the alerts from the last pop_alerts are not freed until you call pop_alerts() again.

Lwxiang commented 1 year ago

I try to look into this problem again recently,

I found that, if I do lots of read_piece during file-checking, the size of read_lru1 list in block_cache will be pretty high, and will not be released after all actions done.

I think maybe the maybe_free_piece skip some evict actions, look the functions following, we skip the evict if pe->marked_for_eviction is false.

but I think it should be evicted with allow_ghost? the following call use the marked_for_eviction to decide the params of evict_piece. maybe we should delete the line || !pe->marked_for_eviction?

bool block_cache::maybe_free_piece(cached_piece_entry* pe)
{
    if (!pe->ok_to_evict()
        || !pe->marked_for_eviction
        || !pe->jobs.empty())
        return false;

    DLOG(stderr, "[%p] block_cache maybe_free_piece "
        "piece: %d refcount: %d marked_for_eviction: %d\n"
        , static_cast<void*>(this)
        , int(pe->piece), int(pe->refcount), int(pe->marked_for_eviction));

    tailqueue<disk_io_job> jobs;
    bool removed = evict_piece(pe, jobs
        , pe->marked_for_deletion ? disallow_ghost : allow_ghost);
    TORRENT_UNUSED(removed); // suppress warning
    TORRENT_PIECE_ASSERT(removed, pe);
    TORRENT_PIECE_ASSERT(jobs.empty(), pe);

    return true;
}

by the way, is there any block cache in version 2.0.x? I can not find it.

arvidn commented 1 year ago

I found that, if I do lots of read_piece during file-checking, the size of read_lru1 list in block_cache will be pretty high, and will not be released after all actions done.

It's only released if the cache size is near its max size. is "pretty high" greater than your cache size?

It seems pretty obvious that the memory is used up by read_piece_alerts, holding a buffer for a whole piece. Can you confirm that that's not the case?

by the way, is there any block cache in version 2.0.x? I can not find it.

No, there's not. it uses the page cache (via memory mapped files) instead.

Lwxiang commented 1 year ago

two parts, read_piece_alerts holds some memory, and the block_cache holds another part.

according to the code, it seems only when move piece to the ghost list has the max_size, the lru1 list has not.

the || !pe->marked_for_eviction condition makes the following call evict_piece will never get allow_ghost

Lwxiang commented 1 year ago

I monitor the cache list size during test.

by default the list size is 32, but the lru1 size increases to thousands and not be released after all actions done.

image
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ValeZAA commented 1 week ago

I monitor the cache list size during test.

by default the list size is 32, but the lru1 size increases to thousands and not be released after all actions done.

This looks like this: https://github.com/arvidn/libtorrent/commit/e5fe95f92273bd775961e2f1f3b4ec45f8012d9e

Please test.