fluree / db

Fluree database library
https://fluree.github.io/db/
Other
340 stars 23 forks source link

loaded db and live db stats do not match #351

Open dpetran opened 1 year ago

dpetran commented 1 year ago

fluree.db.json-ld.reify/merge-flakes calculates the size of all the flakes, whereas fluree.db.json-ld.transact/final-db calculates the size of the asserts and subtracts the size of the retracts.

The (-> db :stats :size) should be the same on a live db and on that same db after load.

cap10morgan commented 1 year ago

branch with failing tests for this (they're in f.d.json-ld.api-test in the :stats equality checks in the single and multi-cardinality load tests): https://github.com/fluree/db/tree/fix/inconsistent-db-stats

dpetran commented 1 year ago

asserts - retracts measurement tells us the size of the flakes an [?s ?p ?o] query would return.

asserts + retracts tells us the rough size the whole db would take on disk - not exactly, since there's the index node overhead, but gives you a rough idea of how much space you would need.

I'm not sure which measure is more useful.

cap10morgan commented 1 year ago

Or maybe they're both useful and we just need to come up with good names for the two different keys in :stats?

dpetran commented 1 year ago

@bplatz do you have a preference here? I think all we need is a decision of one or the other or both, and I think you have more customer context to be able to make that call.

bplatz commented 1 year ago

@dpetran I've always considered the total size of the DB to be important - both 't' values and size, as that would give someone a sense of how many commits and how large they are. Also if the size was a proxy for someone using it for billing, total size would be more important here as well.

dpetran commented 1 year ago

Alright, then we will just do the asserts + retracts implementation to fix this bug.