facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.39k stars 6.29k forks source link

Track Number of bytes written/read to/from disk #12841

Closed zaidoon1 closed 1 month ago

zaidoon1 commented 3 months ago

I'm tyring to track the number of bytes written/read by rocksdb per second and I've been looking at https://github.com/facebook/rocksdb/blob/b6c3495a7183f01901d3be01dc68f7e40a1a2e9b/monitoring/statistics.cc#L22 and It's not clear to me what metrics I need to pull in to get this data. Is it even possible to track total writes/reads of rocksdb to disk?

My thinking for tracking writes:

rocksdb.wal.bytes + rocksdb.compact.write.bytes + rocksdb.flush.write.bytes

also where does rocksdb.bytes.written fit into this?

and for reads:

rocksdb.compact.read.bytes + rocksdb.last.level.read.bytes + rocksdb.non.last.level.read.bytes

But it feels like this is not enough to track total disk writes/reads?

alanpaxton commented 2 months ago

Hi @zaidoon1 could you clarify what you want from "number of bytes written (or read)" ?

If you just want to monitor the number of bytes in keys + values, rocksdb.bytes.written appears to be close to this. But there are always overheads in things like Bloom tables and SST file headers that are shared between writes, and are not recorded as part of that ?

I would think that "rocksdb.wal.bytes + rocksdb.compact.write.bytes + rocksdb.flush.write.bytes" is closer to measuring a lower level disk value (how much gets

Do you want to think about compression ?

Maybe another option for writes would be to look at the SST files themselves in the DB directory ? If you monitor those regularly, outside of RocksDB itself, you would see what is ending up on the filesystem.

There is also the ability to implement user-defined statistics. See https://github.com/facebook/rocksdb/wiki/Statistics I'm not sure of what you can do with it, but it might help you.

zaidoon1 commented 2 months ago

Hi @alanpaxton , I'm trying to track load on disk, so what I'm interested in is how much I/O rocksdb is doing at any given time . I'm not interested in how much data is written in memory, as I have other ways to track memory usage so given that I'm looking at bytes written to disk, this would be after compression

I would think that "rocksdb.wal.bytes + rocksdb.compact.write.bytes + rocksdb.flush.write.bytes" is closer to measuring a lower level disk value (how much gets

I believe that's also everything that I need but wanted to make sure, after adding the metrics, I see that rocksdb.bytes.written == rocksdb.wal.bytes ? I take it this is expected because this tracks writes to WAL when WAL is enabled and writes to disk when WAL is not enabled?

alanpaxton commented 2 months ago

I would think that while rocksdb.bytes.written == rocksdb.wal.bytes CAN be true, it is NOT always true. And how does rocksdb.write.wal differ ? I think you need to have care about which stats are affected when SST files are compacted. And sanity check your results against the file sizes of the SST and log files.

adamretter commented 1 month ago

I suspect @alanpaxton has resolved this, so I am closing the issue due to a lack of follow-up from @zaidoon1