estraier / tkrzw

a set of implementations of DBM
Apache License 2.0
164 stars 20 forks source link

Something wrong with SkipDBM №1: lost new keys #12

Closed tieugene closed 3 years ago

tieugene commented 3 years ago

I created testing application to compare misc key-value storages. And found strange bug in SkipDBM. There are 2 logs in attaches - tkh.log and tks.log (for HashDBM and TreeDBM respectively). Pay attention to Ask and Try sections. 1st of them tries to find known or unknown keys, 2nd - try to find known or add new records. The feature is that both of them have to find known keys exactly in 50% cases (± 1 record). Seems SkipDBM cannot find some just added keys. tkh.log tks.log tkrzw-0.9.7..8

PS. another SkipDBM bug will be further.

estraier commented 3 years ago

Could you tell me the command line to reproduce the issue?

estraier commented 3 years ago

Does this seem OK?

$, /tk_main -f casket.tkt -v 22 Playing 4194304 records:

  1. Add ... 4194304/4194304 @ 161763 ms (26 Kops) Sync ... 562 ms
  2. Get ... 383957/383957 @ 5 s (76 Kops)
  3. Ask ... 151574/303149 @ 5 s (60 Kops)
  4. Try ... 222879/222879 @ 5 s (44 Kops): 111439 get, 111440 add Sync ... 548 ms n = 22, t = 161 s, size = 158 MB, Kops: 26 76 60 44
tieugene commented 3 years ago

Does this seem OK?

Yes, it is OK. Very slow (as for 4M records), but works correctly. Database type depends on filename extension. Bug is reproducing with *.tks

estraier commented 3 years ago

BTW, .tks is the suffix for SkipDB (skip list database). .tkt is for TreeDB (B+ tree database). So, this issue is about SkipDB right?

estraier commented 3 years ago

BTW, SkipDBM is not a mere key/value storage. It's more like "SSTable" for some cloud storages. Storing records in SkipDBM is NOT visible until you call the synchronize method. Therefore, calling Set method twice with the same key before Synchronize doesn't cause DUPLICATION_ERROR. Moreover, SkipDBM can contain multiple records with the same key. How to solve such duplications is determined by the Reducer given to Synchronize.

SkipDBM dbm; dbm.Open(...); dbm.Set("a", "first"); -> SUCCESS dbm.Get("a", ...) -> NOT_FOUND_ERROR dbm.Set("a", "second", false); -> SUCCESS dbm.Synchronize(...); -> By default, all values are kept. So, now "a" has both "first" and "second" dbm.Get("a", ...) -> SUCCESS, "first" is retrieved dbm.Set("a", "third", false); -> DUPLICATION_ERROR, because "a" with "first" and "second" are visible

tieugene commented 3 years ago

BTW, .tks is the suffix for SkipDB (skip list database). .tkt is for TreeDB (B+ tree database). So, this issue is about SkipDB right?

Oops, sorry, you are right and I am not. Fixed

tieugene commented 3 years ago

BTW, SkipDBM is not a mere key/value storage. It's more like "SSTable" for some cloud storages. Storing records in SkipDBM is NOT visible until you call the synchronize method.

But in my case Get(), Ask() and Try() is calling after db->Synchronize() (Look at line Sync... after Add). And keys are not visible yet.

tieugene commented 3 years ago

30 апр. 2021 г., в 03:09, Mikio Hirabayashi @.***> написал(а): Could you tell me the command line to reproduce the issue?

I use devel branch, so output differs from your's.

bash-5.0$ ./kvtest_tk -v -t 3 -f test.tks 20 Playing 2**20 = 1048576 records:

  1. Add ... 1048576 @ 139 ms (7543 Kops) Sync... 1894 ms
  2. Get ... 865035 @ 3 s (288 Kops)
  3. Ask ... 880090 @ 3 s (293 Kops): 440045 = 50% found
  4. Try ... 859303 @ 3 s (286 Kops): 353985 = 41% found Sync... 1646 ms n = 20, t = 0 s, size = 32 MB, Kops: 7543 288 293 286

You are right - Get ok, Ask find 50%. And Try (get or add) find 35..50%

If it is not bug but feature - bug report can be closed. With «WARNING!» in documentation

tieugene commented 3 years ago

by design