Closed CMCDragonkai closed 2 years ago
I want to change the API interface to:
get(keys: string | Buffer | Array<string | Buffer>)
So that way get('key')
or get(['level1', 'key'])
can work.
And a way to create deeply nested levels more easily: https://github.com/Level/subleveldown/issues/112
We should remove any usage and management of sublevels. Unless the sublevels themselves are just DB objects. But mostly it should be possible to just directly use the DB
object instance and use key paths.
The APIs between DB
and DBTransaction
is still different because we want to eventually move to where you can just specify a key path instead of directly maintaining particular sublevel objects. Once all methods can just take a full keypath, there's no need to manage a bunch of sublevel objects, and that should reduce the object maintenance overhead.
// root iterator at dataDb
DB.iterator(options);
// same
DB.iterator(options, []);
// down level 1
DB.iterator(options, ['level1']);
DB.iterator(options, ['level1', 'level2]);
DBTransaction.iterator(options, ['level1', 'level2']);
DB.count();
DB.count(['level1', 'level2']);
DBTransaction.count();
DBTransaction.count(['level1']);
DB.clear();
DB.clear(['level1']);
DBTransaction.clear(['level1']); // this should be a transactional clear, to do this, we must iterator and then do `db.del`
DB.get('key');
DB.get(['level1', 'key']);
// these are not allowed, or they are equivalent to asking for undefined
DB.get([]);
DB.get();
DB.put('key', 'value');
DB.put(['level1', 'key'], 'value');
DB.del('key');
DB.del(['level1', 'key']);
DBTransaction.del('key');
DBTransaction.del(['level1', 'key']);
We will need to test and benchmark sublevel creation multiple times to see how fast we can do this though, because somethings like iterator and count will require creating those sublevels.
Some benchmarking results show that naively creating sublevels in order to support the key paths API or the domain APIs is quite slow:
create 1 sublevels:
87 320 ops/s, ±0.79% | fastest
create 2 sublevels:
45 325 ops/s, ±0.55% | 48.09% slower
create 3 sublevels:
30 294 ops/s, ±0.39% | 65.31% slower
create 4 sublevels:
22 764 ops/s, ±2.34% | slowest, 73.93% slower
get via sublevel:
25 055 ops/s, ±1.36% | 71.31% slower
get via key path concatenation:
54 992 ops/s, ±2.68% | 37.02% slower
Therefore it appears that if we want to have an efficient implementation, instead of just creating intermediate sublevel objects, we instead should be doing key path/prefix concatenation.
Now this is what we do when we use get
and put
and del
, however the methods like level
and count
and dump
and iterator
are not, and are just creating sublevel objects.
For level
, count
, dump
and iterator
, we fundamentally need subleveldown to support taking the concatenated prefix, or an array type as the prefix in https://github.com/Level/subleveldown/issues/112#issuecomment-1075904007. But this is prevented by subleveldown's RangeError
that is thrown https://github.com/Level/subleveldown/issues/111#issuecomment-1070587202.
Basically it seems like
db.sublevel(['foo', bar'])
would be implemented efficiently asdb.sublevel('foo!bar')
so that the key path is just concatenated rather than creating intermediate sublevel objects.
Our in our case:
subleveldown(dbLevel, `foo${utils.prefix}bar`)
// or
subleveldown(dbLevel, ['foo', 'bar'])
I suggest that we remove subleveldown entirely, and just allow key paths for both iterators and get, put, del and clear... etc. Most custom operations like dump
, clear
, count
all rely on the iterator
. And we don't need subleveldown if we run our iterator
based on a specific key path for gt
and lt`.
I reckon it would be something like this:
iterator({
// gt means greater than this key
// since !A!B! would be the domain path, any key added would always be 1 character more
gt: '!A!B!',
// this would have to be the very next byte number above A!B, but that which doesn't produce an extra character, we could create a buffer and just increment by 1 the last byte
// if `!` is the separator which is 0x21, then plus 1 would be `"` which is 0x22
lt: '!A!B"'
})
Then we probably solve a number of problems.
This script proves we can just use gt
and lt
to get a proper iteration:
import DB from './src/DB';
async function main () {
const db = await DB.createDB({
dbPath: './tmp/db',
fresh: true
});
const l1 = await db.level('level1');
await db.put(['level1'], 'k', 'v');
await db.put(['level1'], 'a', 'v');
await db.put(['level1'], 'c', 'v');
await db.level('level2');
await db.put(['level2'], 'k', 'v');
await db.put(['level2'], 'a', 'v');
await db.put(['level2'], 'c', 'v');
await db.put([], '!level1"', 'v');
await db.put([], '!level1 ', 'v');
console.log('USING SUBLEVELDOWN');
for await (const k of l1.createKeyStream()) {
console.log(k.toString());
}
console.log('ITERATOR');
const i = db.dataDb.iterator({
gt: '!level1!',
lt: '!level1"'
});
// @ts-ignore
for await (const [k, v] of i) {
console.log(k.toString());
}
console.log('KEY STREAM');
for await (const k of db.dataDb.createKeyStream({
gt: '!level1!',
lt: '!level1"'
})) {
console.log(k.toString());
}
console.log('DUMP');
console.log(await db.dump());
await db.stop();
}
main();
I just realised that dump
doesn't preserve order, and that it assumes keys can just be strings. We can change this to an array of tuples, where the first tuple is always a buffer, or a binary string, and the value is decrypted as usual. If I want to use binary string I can use Buffer.toString('binary')
, not this is the same as if I used String.fromCharCode(...key)
, but I used that in js-id
due to Uint8Array
.
The raw === true
can be used to return buffers for both.
With this prototype, I might just go ahead with getting rid of subleveldown entirely, and then that would make this 3.0.0 as the APIs are again different. Then that would affect PK quite significantly.
Even without using subleveldown, the DB can still use abstract-level
, and we sort of rely less on what the underlying system can do.
I missed one thing, sublevels are actually indexed like this:
!A!!B!k
So between sublevels we have !!
not just a single !
.
Due to #12, the 3.0.0 DB has been released and the interface shouldn't change all that much even with upgrade to abstract-level.
https://github.com/MatrixAI/js-db/pull/13 has enabled the ability to arbitrarily use level parts without worrying about reserved bytes.
Specification
The abstract-level https://github.com/Level/abstract-level is a new foundation of LevelDB. Upgrading to this will help us resolve all other issues in DB.
3 - abstract-level supports it directly
8 - Our DB abstraction should be rebuilt as an abstract-level layer, that is a class extending
AbstractLevel
It will still need to use
classic-level
(which is just leveldown atm) in NodeJS because NodeJS doesn't support indexed DB atm. But this should then basically makeDB
just an implementation of theabstract-level
interface.classic-level
that sets up root sublevels ofdata
,transactions
, andindex
and performs semi-transparent encryption & decryption (since it doesn't have a block level yet).4 Should already be resolved, since we end up using
iterator()
for read-committed level transactions, and true snapshots aren't really needed.1 becomes possible now that we have a better idea of what kind of indexing needs we need, and we can put all the index data under the
index
root level5 still cannot be solved until IDB becomes available, then we can use
browser-level
orlevel-js
and have the transparent encryption be done at the block level. Or done at the C++ level inside leveldb, or rocksdb.Wait until level has released version 8, although this can be started now, since we may not need to use the
level
package at all. The only dependencies ofjs-db
may beabstract-level
andclassic-level
.Additional context
Tasks