dermesser / leveldb-rs

A reimplementation of LevelDB in Rust (no bindings).
Other
521 stars 59 forks source link

Possible memory leak on large data #11

Open dractw opened 2 years ago

dractw commented 2 years ago

I found strange behaviour with this method https://github.com/dermesser/leveldb-rs/blob/master/src/table_reader.rs#L42 Probably ref cycle?

Code sample on repr:

 let mut block_index = Vec::with_capacity(800000);
    let mut db = DB::open(path, Options::default())?;
    let mut iter = db.new_iter()?;
    let (mut k, mut v) = (Vec::with_capacity(800000), Vec::with_capacity(800000));

    while iter.advance() {
        iter.current(&mut k, &mut v);
        if is_block_index_record(&k) {
            let record = BlockIndexRecord::from(&k[1..], &v)?;
            if record.status & (BLOCK_VALID_CHAIN | BLOCK_HAVE_DATA | BLOCK_VALID_CHAIN) > 0 {
                block_index.push(record);
            }
        }
    }

 Ok(block_index)

Related valgrind output:

8,409,399 bytes in 2,018 blocks are possibly lost in loss record 1,309 of 1,311
==2643397==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2643397==    by 0x20FAB7: rusty_leveldb::table_block::read_table_block 
==2643397==    by 0x2129EE: rusty_leveldb::table_reader::TableIterator::load_block 
==2643397==    by 0x212722: rusty_leveldb::table_reader::TableIterator::skip_to_next_entry 
==2643397==    by 0x2125C0: <rusty_leveldb::table_reader::TableIterator as rusty_leveldb::types::LdbIterator>::advance 
==2643397==    by 0x2158B2: <rusty_leveldb::version::VersionIter as rusty_leveldb::types::LdbIterator>::advance 
==2643397==    by 0x20C64B: <rusty_leveldb::merging_iter::MergingIter as rusty_leveldb::types::LdbIterator>::advance 
==2643397==    by 0x23A84B: rusty_leveldb::db_iter::DBIterator::find_next_user_entry
==2643397==    by 0x15BE54: rusty_blockparser::blockchain::parser::index::get_block_index 
==2643397==    by 0x1564C9: rusty_blockparser::blockchain::parser::chain::ChainStorage::new 
==2643397==    by 0x176E6B: rusty_blockparser::main 
==2643397==    by 0x18AF62: std::sys_common::backtrace::__rust_begin_short_backtrace 
==2643397== 
==2643397== 16,801,974 bytes in 4,032 blocks are possibly lost in loss record 1,310 of 1,311
==2643397==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2643397==    by 0x20FAB7: rusty_leveldb::table_block::read_table_block (
==2643397==    by 0x2129EE: rusty_leveldb::table_reader::TableIterator::load_block 
==2643397==    by 0x212722: rusty_leveldb::table_reader::TableIterator::skip_to_next_entry 
==2643397==    by 0x2125C0: <rusty_leveldb::table_reader::TableIterator as rusty_leveldb::types::LdbIterator>::advance 
==2643397==    by 0x2158B2: <rusty_leveldb::version::VersionIter as rusty_leveldb::types::LdbIterator>::advance
==2643397==    by 0x20C64B: <rusty_leveldb::merging_iter::MergingIter as rusty_leveldb::types::LdbIterator>::advance 
==2643397==    by 0x23A84B: rusty_leveldb::db_iter::DBIterator::find_next_user_entry
==2643397==    by 0x15BE54: rusty_blockparser::blockchain::parser::index::get_block_index 
==2643397==    by 0x17AF6D: rusty_blockparser::main 
==2643397==    by 0x18AF62: std::sys_common::backtrace::__rust_begin_short_backtrace (
==2643397==    by 0x18E9A0: main

Any thoughts?

dermesser commented 2 years ago

Thank you for the report! It looks to me that the function in question (read_table_block()) essentially allocates memory for any new block read from disk, thus showing up as original allocator. This means that the bug, unfortunately, could be anywhere for now.

Have you checked if this happens for other uses of the database, too? What if the database object is dropped explicitly?

Other than that, I don't have a good clue right away. This looks like it could get interesting

dractw commented 2 years ago

@dermesser Could you describe a little more about database object is dropped explicitly? How should I try to do this?

dermesser commented 2 years ago

Just an idea, you can do this by creating the database object in its own function or scope, inside of where you create it currently. This is unlikely to change anything, though - just a shot in the dark.

dractw commented 2 years ago

@dermesser Anyways will try, thanks!

upd: same behaviour ;(

dractw commented 2 years ago

@dermesser additional valgrind sample on debug build

Each outer iteration of calling db.iter causing doubling leaked memory, at least at my leveldb data. (fully synced bitcoin leveldb)

==2713031== 8,582,786 bytes in 2,044 blocks are possibly lost in loss record 1,251 of 1,252
==2713031==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2713031==    by 0x3844CB: alloc::alloc::alloc_zeroed (alloc.rs:158)
==2713031==    by 0x38456C: alloc::alloc::Global::alloc_impl (alloc.rs:169)
==2713031==    by 0x384B2C: <alloc::alloc::Global as core::alloc::Allocator>::allocate_zeroed (alloc.rs:234)
==2713031==    by 0x37E3C1: alloc::raw_vec::RawVec<T,A>::allocate_in (raw_vec.rs:186)
==2713031==    by 0x37DF0C: alloc::raw_vec::RawVec<T,A>::with_capacity_zeroed_in (raw_vec.rs:140)
==2713031==    by 0x38D001: <u8 as alloc::vec::spec_from_elem::SpecFromElem>::from_elem (spec_from_elem.rs:39)
==2713031==    by 0x38DC8F: alloc::vec::from_elem (mod.rs:2352)
==2713031==    by 0x30C2AB: rusty_leveldb::table_block::read_bytes (table_block.rs:18)
==2713031==    by 0x30C81C: rusty_leveldb::table_block::read_table_block (table_block.rs:49)
==2713031==    by 0x31624A: rusty_leveldb::table_reader::Table::read_block (table_reader.rs:124)
==2713031==    by 0x31699C: rusty_leveldb::table_reader::TableIterator::load_block (table_reader.rs:240)
dractw commented 2 years ago

@dermesser Unlikely #12 is related to current issue, leak is still there

Prob related https://github.com/rust-lang/rust/issues/60114

dermesser commented 2 years ago

yeah I didn't think it was, anyway. I guess at the time (a few years back) I relied too much on safety promises etc. that I didn't bother using memcheck enough :)

dermesser commented 2 years ago

I also opened #13 for this. Given that memory leaks can end up having unforeseen areas of impact, it may (hopefully) occur that one I fix there will solve this issue. In the meantime, if you find some time on your hands, please feel free to search as well :) Although I understand if you don't want to dive in a deep unfamiliar codebase

dractw commented 2 years ago

@dermesser I've already done some researches, firstly trying to use jemalloc instead default allocator, my thoughts was like maybe system allocator didn't returns freed allocated memory to system, after read_at called and filled vec was dropped. But unfortunately it does not significantly helps, so i will try to dig deeper in codebase. Jfyi

dermesser commented 2 years ago

I see - thank you for investing such an effort here! I've also checked some basics; after removing one faulty (and unneeded) Drop implementation yesterday, there are only two implementations left, and one of them seems completely harmless.

As you've noticed before, there is a lot of Rc use going on. I wouldn't be surprised if there are circular references here, but I need to think of a good way of debugging these.

dractw commented 2 years ago

@dermesser I found few possible circular references, but idk yet how to fix it properly. https://github.com/dermesser/leveldb-rs/blob/master/src/table_reader.rs#L32 https://github.com/dermesser/leveldb-rs/blob/master/src/table_reader.rs#L101 Rc<Box<Rc<Box<Rc<Box>>>>?

dermesser commented 2 years ago
  1. do you mean the file field? What do you think about it?
  2. this will for sure have a few levels of nesting, but that is intentional... it shouldn't be much deeper than two or three (the options object comes from outside after all). Or do I miss something here? I don't see the circular reference, the existing Rc is just wrapped inside a new one.
dractw commented 2 years ago
  1. do you mean the file field? What do you think about it?
  2. this will for sure have a few levels of nesting, but that is intentional... it shouldn't be much deeper than two or three (the options object comes from outside after all). Or do I miss something here? I don't see the circular reference, the existing Rc is just wrapped inside a new one.

Well, i don't quite understand how it works for now (and obv reasons) just because im not so strong with rust, but very curious. I think straight-forwarding investigation with Rc::strong_count will makes some clearence, what do u think?