aiidateam / disk-objectstore

An implementation of an efficient "object store" (actually, a key-value store) writing files on disk and not requiring a running server
https://disk-objectstore.readthedocs.io
MIT License
15 stars 8 forks source link

VACUUMing the DB and performance on listing via the index #94

Closed giovannipizzi closed 3 years ago

giovannipizzi commented 4 years ago
  1. When creating a DB a bit at a time, the index is scattered on the file. On a big DB, if the DB file is not in the OS disk cache, it means a huge performance hit (~1000it/s instead of 600000it/s), e.g. on any listing using the covering index on the hash key (e.g. SELECT hashkey from db_object ORDER BY hashkey). E.g. test to perform the query above right after flushing the caches with sudo su -c "echo 3 > /proc/sys/vm/drop_caches":

    from disk_objectstore import Container
    import tqdm
    
    container = Container('/scratch/TEST-DISK-OBJECTSTORE/test-newrepo-sdb/')
    session = container._get_cached_session()
    
    hashkeys_q = session.execute("SELECT hashkey FROM db_object ORDER BY hashkey")
    list(tqdm.tqdm(hashkeys_q, total=6714808))
  2. If the DB file fits in memory, just performing cat packs.idx > /dev/null will make all the rest of the operations (like listing ordered by hashkey) fast again (600000it/s vs 1000it/s)
  3. a better long-term solution is to call VACUUM on the DB, as this will defragment the DB and the indexes (note: the content of the SQLite file, not how it's written on the filesystem). Also, this is needed when deleting entries to recall space. Note that the first iteration on SELECT hashkey from db_object ORDER BY hashkey will run at ~270000it/s instead of the later times (with caching) when it will run at 600000it/s but it is already fast enough. Also, looking at how much data goes into the disk OS cache, it seems that it only actually needs to read and keep in the cache 550MB out of 1.2GB of the sqlite file (probably, the size of the index on hashkey).
  4. performing SELECT count(*) from db_object decides to use the index on the hash key that, if the DB is 'fragmented', it's very slow as above (it's much faster to count the length in python of SELECT id from db_object ORDER BY id). Therefore we can:
    • change how we count objects
    • have a trick to call cat on the whole file to pre-fetch the data (but only if it all fits in RAM!)
    • have a maintenance operation that calls VACUUM (probably better choice, to combine with other maintenance operations)

Mentioning also #92 as the performance when looping over sorted results by hash key will be strongly dependent on either caching the file, or VACUUMing it first (that, however, is slow)

giovannipizzi commented 4 years ago

According to this SQLite page, the following SQL can be used to estimate how 'fragmented' the DB is and if you need to call VACUUM:

CREATE TEMP TABLE s(rowid INTEGER PRIMARY KEY, pageno INT);
INSERT INTO s(pageno) SELECT pageno FROM dbstat ORDER BY path;
SELECT sum(s1.pageno+1==s2.pageno)*1.0/count(*)
  FROM s AS s1, s AS s2
 WHERE s1.rowid+1=s2.rowid;
DROP TABLE s;

As a reference here is the value on the big SDB DB (~6.8M entries, 1.2GB) before VACUUMING:

giovannipizzi commented 4 years ago

To vacuum a Container c this can be done:

engine = c._get_cached_session().get_bind()
engine.execute("VACUUM")