estraier / tkrzw

a set of implementations of DBM
Apache License 2.0
164 stars 20 forks source link

Should `Rebuild` be called by the user when a TreeDBM is kept open for long? #45

Closed rlofc closed 8 months ago

rlofc commented 8 months ago

I'm seeing a weird behavior that I can so far only associate with a long running TreeDBM session with >100k write/delete ops of binary blobs. My files inflates and at some point, so I suspect, an exception from the library is thrown that creates an inconsistent state of the data.

Is it recommended to preemptively Rebuild the data file during a long-running session. So far, it appears to prevent this from happening - but I'm not 100% sure.

estraier commented 8 months ago

What exception do you see? If data inconsistency happens, I suspect either of that the process has crushed or that there is a bug.

Fragmentation is inevitable when you repeatedly write/delete records. Then, calling rebuild is a good idea to reduce the file size. You can call ShouldBeRebuilt for that purpose.

Anyhow, inconsistency and fragmentation are different things. If inconsistency is the matter, we should examine it in detail. If fragmentation is the matter, just calling Rebuild occasionally will do the job.

rlofc commented 8 months ago

It's a bit challenging to get the originating exception, but the result is that I get a corrupt file with the following segfault after calling Get:

0x00007ffff7727e3e in tkrzw::TreeDBMImpl::SearchTree (this=this@entry=0x5555559464d0, key=...,
    leaf_node=leaf_node@entry=0x7fffff7ff1a0) at tkrzw_dbm_tree.cc:1562
1562    Status TreeDBMImpl::SearchTree(std::string_view key, std::shared_ptr<TreeLeafNode>* leaf_node) {
(gdb) bt
#0  0x00007ffff7727e3e in tkrzw::TreeDBMImpl::SearchTree (this=this@entry=0x5555559464d0, key=...,
    leaf_node=leaf_node@entry=0x7fffff7ff1a0) at tkrzw_dbm_tree.cc:1562
#1  0x00007ffff772f867 in tkrzw::TreeDBMImpl::Process (this=0x5555559464d0, key="EME56GHC",
    proc=proc@entry=0x7fffff7ff260, writable=writable@entry=false) at tkrzw_dbm_tree.cc:575
#2  0x00007ffff772fb95 in tkrzw::TreeDBM::Process (this=this@entry=0x55555595d568, key=...,
    proc=proc@entry=0x7fffff7ff260, writable=writable@entry=false) at tkrzw_dbm_tree.cc:2557
#3  0x00005555556aba52 in tkrzw::DBM::Get (value=0x0, key=..., this=0x55555595d568) at /usr/include/tkrzw_dbm.h:1041
#4  EntityRepo::exists (this=this@entry=0x55555595d568, uuid="EME56GHC") at ../server/repo.cc:188
rafal98 commented 8 months ago

Hi,

You talk about HashDBM but your stack is just about TreeDB. Another point, but not sure, the #3 frame value=0x0 is quite suspect

My 2ct

Le sam. 16 déc. 2023 à 16:35, Ithai Levi @.***> a écrit :

It's a bit challenging to get the originating exception, but the result is the I get a corrupt file with the following segfault after calling Get:

0x00007ffff7727e3e in tkrzw::TreeDBMImpl::SearchTree @.=0x5555559464d0, key=..., @.=0x7fffff7ff1a0) at tkrzw_dbm_tree.cc:15621562 Status TreeDBMImpl::SearchTree(std::string_view key, std::shared_ptr* leaf_node) { (gdb) bt#0 0x00007ffff7727e3e in tkrzw::TreeDBMImpl::SearchTree @.=0x5555559464d0, key=..., @.=0x7fffff7ff1a0) at tkrzw_dbm_tree.cc:1562#1 0x00007ffff772f867 in tkrzw::TreeDBMImpl::Process (this=0x5555559464d0, key="EME56GHC", @.=0x7fffff7ff260, @.=false) at tkrzw_dbm_tree.cc:575#2 0x00007ffff772fb95 in tkrzw::TreeDBM::Process @.=0x55555595d568, key=..., @.=0x7fffff7ff260, @.=false) at tkrzw_dbm_tree.cc:2557#3 0x00005555556aba52 in tkrzw::DBM::Get (value=0x0, key=..., this=0x55555595d568) at /usr/include/tkrzw_dbm.h:1041#4 EntityRepo::exists @.=0x55555595d568, uuid="EME56GHC") at ../server/repo.cc:188

— Reply to this email directly, view it on GitHub https://github.com/estraier/tkrzw/issues/45#issuecomment-1858844978, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI3EBD4YP24AP3AY5FDD53YJW5TJAVCNFSM6AAAAABAXTNZ7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYHA2DIOJXHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rlofc commented 8 months ago

@rafal98 here's what I get from inspect:

Inspection:
  class=HashDBM
  healthy=false
  auto_restored=false
  path=kEFvdTUb.tkh
  cyclic_magic=162
  pkg_major_version=1
  pkg_minor_version=0
  static_flags=33
  offset_width=4
  align_pow=10
  closure_flags=0
  num_buckets=131101
  num_records=26143
  eff_data_size=35163370
  file_size=56363008
  timestamp=1702725550.719930
  db_type=0
  max_file_size=4398046511104
  record_base=528384
  update_mode=in-place
  record_crc_mode=none
  record_comp_mode=zstd
Actual File Size: 56363008
Number of Records: 26143
Healthy: false
Should be Rebuilt: false
rlofc commented 8 months ago

Okay that's even weirder - the file is instantiated as tkrzw::TreeDBM - and I have an AsyncDBM in use too, but it is showing is HashDBM in inspect. Other than that issue - all works as intended.

EDIT - confirming that the file is most likely a TreeDBM (although being reported by inspect as HashDBM). I just changed the code to open the file as HashDBM and reading any record fails.

rlofc commented 8 months ago

Closing this. I suspect the inconsistency issue I'm encountering is due to using AsyncDBM on top of TreeDBM and not employing error checking correctly (although, I'm not sure what is the best way to do so correctly using AsyncDBM - but that's for another issue). As for the crash, I'm 99.99% sure this is my bad. In any case, I think the answer by @estraier to my original question in the title is sufficient.