Open cznic opened 8 years ago
Additional info, the same workload in sqlite: 11MB, leveldb: 14MB, default settings. Thanks for the work Jan, I decided to go with other DB for the moment ;)
I'm really sorry I haven't yet time to investigate the issue.
In any case, thanks again for your contributions. Also, I will add you as a collaborator now.
@deuter0n
I have some datapoints for you. Using https://github.com/cznic/lldb/commit/d5b271c8ef226139149a6ed37b9ea94b1fcd6cfe, on a bit old HW:
$ go test -v -run Issue12 -timeout 1h
=== RUN TestIssue12
TestIssue12: Warning, run with -timeout at least 1h
--- PASS: TestIssue12 (1197.13s)
all_test.go:106: Using compression (recommended): false
all_test.go:122: 1048576 keys in 8m58.523277614s, 1947.132173 keys/s, 513.575µs s/key
all_test.go:129: File size: 62589728, 59.690216 bytes/key
all_test.go:136:
lldb.AllocStats{
· Handles: 2486,
· TotalAtoms: 3911851,
· AllocBytes: 39917757,
· AllocAtoms: 2499458,
· Relocations: 2373,
· FreeAtoms: 1412393,
... snip
}
all_test.go:106: Using compression (recommended): true
all_test.go:122: 1048576 keys in 10m58.374390217s, 1592.674344 keys/s, 627.874µs s/key
all_test.go:129: File size: 11666448, 11.125992 bytes/key
all_test.go:136:
lldb.AllocStats{
· Handles: 2486,
· Compression: 2485,
· TotalAtoms: 729146,
· AllocBytes: 39917757,
· AllocAtoms: 451371,
· Relocations: 2233,
· FreeAtoms: 277775,
... snip
}
PASS
ok github.com/cznic/lldb 1197.151s
$
The observation w/o compression confirms your original report - the file size is ~60MB.
However, when using compresion, as recommended in the documentation:
It's recommended to use compression. For example the BTrees implementation assumes compression is used. Using compression may cause a slowdown in some cases while it may as well cause a speedup.
the file size shrinks to ~11MB.
In my case I have to disable compression since it can weaken encryption. Compression depends on the data itself and affects the length of the data. That means that it can leak information about the data through the length bypassing the encryption. The reduction to 11MB I guess is caused by easily compressable zeroes, or in other words, bad space utilization in the btree implementation.
The reduction to 11MB I guess is caused by easily compressable zeroes, or in other words, bad space utilization in the btree implementation.
It's complicated. For BTree items of fixed size it would be easier. For variable length items one has to compromise. Concatenating all the items in the BTree page is not an option b/c the maximum allocation size is capped to about 64kB.
Too small fixed item portion degrades performance b/c on access it may be necessary to reach for the overflown part. Too big fixed item portion degrades space utilization.
The value chosen is 19 bytes . I have benchmarked the value for various data types. Unfortunately, K/V items of total size 4 bytes are far below the good-on-average value. The value is also tuned to accommodate two encoded scalars each of size 8 bytes. That's what typically occurs in a SQL index.
FYI: The data layout is described here.
@deuter0n
One more question, if you don't mind. Are the size of the keys and/or the values of the BTree items that you want to use in your app known in advance, ie. are they fixed (per tree)?
So I guess it's a matter of tradeoff. By having immutable handles:
Are the keys / values fixed ?
Some are fixed→fixed, some var→var. Any suggestion ?
I actually did the testcase above due to IndexSeek() confusion. So ...
"a/b", "a/c", "b/b", "b/c/d"
,
v = variable blobsIndexSeek(collate=bytes.Compare, c=negate(bytes.Compare))
Expecting "a/c"
, didn't work, saw the TODO, or maybe I misunderstood."a/c"
from btree2, then get actual v from btree1.So I guess it's a matter of tradeoff.
True.
Some are fixed→fixed, some var→var. Any suggestion ?
I can imagine adding, say CreateFixedBtree(keySize, valueSize int, ...)
. It will actually be useful also for QL.
- k = path, e.g. "a/b", "a/c", "b/b", "b/c/d", v = variable blobs
- Tried "get biggest key in a/" using IndexSeek(collate=bytes.Compare, c=negate(bytes.Compare)) Expecting "a/c", didn't work, saw the TODO, or maybe I misunderstood.
I think that to find the biggest key prefixed with "a/"
it would be necessary to Seek
(not IndexSeek
) to the first key which collates lexically after, in this case "b"
and then do Prev
on the iterator.
The IndexSeek
is meant to support range queries on multidimensional data (composite indexes), but the multidimensional index handling code is not yet implemented, hence no test either.
Also, wrt the concerns with encryption and information leaking. When thinking more about this, it seems to me no information can leak due to this until the attacker knows where the blocks reside. But that cannot be inferred w/o decrypting the DB in the first place. I am probably missing something.
No need to decrypt. In block/stream cipher, attacker can just diff the old vs new db to know where the block resides. See also CRIME attack.
IMO compression should be left at the higher level application code, not at storage. Different applications demand different compression/security level. There's also a general distaste in crypto community toward mixing compression and crypto together. The resulting effect is just unpredicable.
No need to decrypt. In block/stream cipher, attacker can just diff the old vs new db to know where the block resides. See also CRIME attack.
Interesting. I thought about the diff, but I assumed that warrants access to the encrypting machine, meaning encryption cannot help anyhow. I forgot to consider the encrypting machine can be manipulated to produce the diff-revealing data.
IMO compression should be left at the higher level application code, not at storage. Different applications demand different compression/security level. There's also a general distaste in crypto community toward mixing compression and crypto together. The resulting effect is just unpredicable.
Track separately:
In #3 @deuter0n wrote:
Investigate if it's a storage space (leak) bug or if the implementation is just crappy. Fix/improve it iff possible w/o breaking backward compatibility with any existing DBs.