dermesser / leveldb-rs

A reimplementation of LevelDB in Rust (no bindings).
Other
515 stars 60 forks source link

PANIC: could not remove_last(); bug! #1

Open bekh6ex opened 4 years ago

bekh6ex commented 4 years ago

Not sure how useful is this report, but anyway...

I was playing with example write-a-lot, adjusting size of the data and got this error: could not remove_last(); bug!

Now the application won't start. Fails with:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `[71, 219, 128, 139, 36, 117, 71, 219]`,
 right: `[87, 251, 128, 139, 36, 117, 71, 219]`',  .../rusty-leveldb-0.3.0/src/table_builder.rs:50:9

Current code:


use rand::Rng;
use rusty_leveldb::CompressionType;
use rusty_leveldb::Options;
use rusty_leveldb::DB;

use std::error::Error;
use std::iter::FromIterator;

const KEY_LEN: usize = 16;
const VAL_LEN: usize = 400_000;

fn gen_string(len: usize) -> String {
    let mut rng = rand::thread_rng();
    String::from_iter(rng.gen_ascii_chars().take(len))
}
fn gen_val(len: usize) -> String {
    let mut rng = rand::thread_rng();
    let s = String::from_iter(rng.gen_ascii_chars().take(10));
    s.repeat(len / 10)
}

fn fill_db(db: &mut DB, entries: usize) -> Result<(), Box<Error>> {
    for i in 0..entries {
        println!("{}", i);
        let (k, v) = (gen_string(KEY_LEN), gen_val(VAL_LEN));
        db.put(k.as_bytes(), v.as_bytes())?;

        if i % 100 == 0 {
            db.flush()?;
        }
    }
    Ok(())
}

fn main() {
    println!("Start");
    let mut opt = Options::default();
    opt.compression_type = CompressionType::CompressionSnappy;
    let mut db = DB::open("/tmp/leveldb-test1", opt).unwrap();
    println!("DB opened");

    fill_db(&mut db, 32768).unwrap();
}

Rust version: rustc 1.41.0 (5e1a79984 2020-01-27)

DB state: https://send.firefox.com/download/7281446ca733dbcd/#lVmzPpIyJVg1JrmGt7hMCw

dermesser commented 4 years ago

Thank you for the report! I have been a bit out of touch with my codebase, so I probably cannot debug it in an instant, but I will take a look!

dermesser commented 4 years ago

The error you see when trying to run (open the database) is due to header corruption... probably due to the failed write.

The original error is from cache.rs:208:

203     pub fn insert(&mut self, key: &CacheKey, elem: T) {
204         if self.list.count() >= self.cap {
205             if let Some(removed_key) = self.list.remove_last() {
206                 assert!(self.map.remove(&removed_key).is_some()); 
207             } else {
208                 panic!("could not remove_last(); bug!");
209             }
210         }
211 
212         let lru_handle = self.list.insert(key.clone());
213         self.map.insert(key.clone(), (elem, lru_handle));
214     }

There must be something wrong in the cache logic that it is both full, but the last element in the LRU list isn't present. The tests are supposed to cover this, but apparently don't :)

Do you know if it's deterministically reproducible?

dermesser commented 4 years ago

interestingly, on my machine it crashes with a different error:

thread 'main' panicked at 'attempt to subtract with overflow', src/log.rs:152:16
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

indicating that the table reader is trying to read more from a block than the 32768 bytes in one. This points to some corruption, which is usually very difficult to fix. We should probably focus on the original bug that caused the crashing.

dermesser commented 4 years ago

thread 'main' panicked at 'assertion failed: (left == right) left: [71, 219, 128, 139, 36, 117, 71, 219], right: [87, 251, 128, 139, 36, 117, 71, 219]', .../rusty-leveldb-0.3.0/src/table_builder.rs:50:9

note that for the first numbers:

71: 1 00 01 11
87: 1 01 01 11

219: 11 01 10 11
251: 11 11 10 11

Both differences are just one flipped bit; both times in the third position of the byte. Possibly this originates from some corruption? I know both from experience and literature that LevelDB's design doesn't cope very well with corruption (after all, we already ignore blocks with bad checksum, and if the data layout is wrong we're just hopelessly lost at the moment).

dermesser commented 4 years ago

I'm actually wrong, the first time it's at the fourth position, then at the third.

l4l commented 2 years ago

I also encountered the same issue as the author. Though only able to reproduce the subtract with overflow and get the backtrace of the process that corrupted the state, i.e the one that was killed. Hope it helps:

#0  0x000055555563af1a in core::ptr::const_ptr::{impl#0}::guaranteed_eq<u8> (self=0x55556dcbd772, other=0x0)
    at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/ptr/const_ptr.rs:422
#1  0x000055555563af4d in core::ptr::const_ptr::{impl#0}::is_null<u8> (self=0x55556dcbd772)
    at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/ptr/const_ptr.rs:40
#2  0x000055555563b4e5 in core::slice::iter::{impl#173}::next<u8> (self=0x7fffffffc5e0)
    at /rustc/02072b482a8b5357f7fb5e5637444ae30e423c40/library/core/src/slice/iter/macros.rs:144
#3  0x000055555563afec in crc::crc32::update (value=2473163403, table=0x7fffffffd358, bytes=...)
    at /home/kitsu/.cargo/registry/src/github.com-1ecc6299db9ec823/crc-1.8.1/src/crc32.rs:24
#4  0x000055555563b13c in crc::crc32::{impl#1}::write (self=0x7fffffffd358, bytes=...)
    at /home/kitsu/.cargo/registry/src/github.com-1ecc6299db9ec823/crc-1.8.1/src/crc32.rs:65
#5  0x0000555555586eaf in rusty_leveldb::log::LogWriter<std::io::buffered::bufwriter::BufWriter<alloc::boxed::Box<dyn std::io::Write, alloc::alloc::Global>>>::emit_record<std::io::buffered::bufwriter::BufWriter<alloc::boxed::Box<dyn std::io::Write, alloc::alloc::Global>>> (self=0x7fffffffd318, 
    t=rusty_leveldb::log::RecordType::Middle, data=..., len=32761) at src/log.rs:99
#6  0x0000555555586c0b in rusty_leveldb::log::LogWriter<std::io::buffered::bufwriter::BufWriter<alloc::boxed::Box<dyn std::io::Write, alloc::alloc::Global>>>::add_record<std::io::buffered::bufwriter::BufWriter<alloc::boxed::Box<dyn std::io::Write, alloc::alloc::Global>>> (self=0x7fffffffd318, r=...) at src/log.rs:87
#7  0x00005555555f6114 in rusty_leveldb::db_impl::DB::write (self=0x7fffffffd258, batch=..., sync=false)
    at src/db_impl.rs:400
#8  0x00005555555f5cd6 in rusty_leveldb::db_impl::DB::put (self=0x7fffffffd258, k=..., v=...)
    at src/db_impl.rs:377
#9  0x0000555555576d75 in rusty_leveldb::fill_db (db=0x7fffffffd258, entries=32768) at src/main.rs:25
#10 0x00005555555770c4 in rusty_leveldb::main () at src/main.rs:41

DB dump

I might think of wrong operation ordering so the partial execution makes the state invalid.

dermesser commented 2 years ago

Thank you for the information! I will try to dive down into the code base again at some point, it's been some time :)