Closed marvin-j97 closed 3 weeks ago
Ah yes, the implementation needs to be optimized. Creating a range does two lookups (beginning and end of range), which is why it's less efficient than get
To prove my point, waiting for everything to be written, and replacing db.last() with db.get(LAST_ITEM_ID), reads 20x faster
Did you close the database and re-open it, when you ran this test? And if so, did you do that for both last()
and get()
? I wasn't able to reproduce your 20x number, when I compare them.
I think there are two issues, one is cache management but it affects both last()
and get()
. When the database is closed and re-opened the cache is cleared, and that would explain most of the 20x difference you observed.
The second issue is that last()
does two lookups, like I mentioned. This one I have a fix for.
Did you close the database and re-open it, when you ran this test? And if so, did you do that for both last() and get()? I wasn't able to reproduce your 20x number, when I compare them.
Clean database, though this seems to heavily depend on how you read. When you read while inserting, the performance cliff will happen. If you let it write all 250k items, then do first(), last(), get(0), get(LAST), it will perform much better:
Read while insert:
first(): 240k reads per second after all inserts are done
last(): 220k reads per second after all inserts are done
get(0): 540k reads per second after all inserts are done
get(LAST): 420k reads per second after all inserts are done
Only read after insert:
first(): 890k reads per second
last(): 900k reads per second
get(0): 1.1M reads per second
get(LAST): 2.4M reads per second
Ah, I found the issue. There was a race in the tracking of the remaining read cache space. Thanks for the report!
I don't know if this is expected performance behaviour, but I found this during benchmarking and it may have pretty serious implications about data-to-cache ratio and read performance for specific workloads.
Basically, the benchmark is just writing small items in monotonic order and reading the last one (Table::last). It will probably happen with other operations such as ::first etc. too.
Depending on the item count & cache size, at some point the read performance will absolutely fall off a cliff.
The cache is set to 4 MB (pretty low), and after 140k (small!) items, the read performance tanks by 6x.
Excerpt:
Even after all (250k) items were written, the read performance does not recover, even though only the very last part of the keyspace is read. I wouldn't really expect that, as per Db stats, the tree height is only 4, so realistically only a couple of 4K pages are needed to get to any leaf page, right? To prove my point, waiting for everything to be written, and replacing
db.last()
withdb.get(LAST_ITEM_ID)
, reads 20x faster. So it kind of confirms my theory that this cache size is high enough to handle the tree traversal quickly, but specifically range operations seem to degrade somehow.If you increase the cache to 8MB, it can maybe pull off 400k items, with 16MB it can do 800k, etc. Even though the read pattern is not random at all.
And here's a comparative benchmark, even though it's hard to compare (I ran redb multiple times just to be sure)... Sled uses tons of memory, as it doesn't seem to obey the set cache size. LMDB just caches in memory, so it's basically cheating too. Fjall is an LSM-tree and can just jump to the correct block very easily, so its cache size can be very low (probably <1 MB) in this scenario for basically an infinite amount of items.
Here's a reproduction script: https://gist.github.com/marvin-j97/8f3ec281bfeeebec2532580a51d741b3