Closed MatrixAI-Bot closed 2 years ago
Pipeline Attempt on 551843671 for 0a0901d07862fa255104a8d3e28530e89f478ca2
https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/551843671
Pipeline Attempt on 551878750 for de7eead5194aef861a37e7df30a52f95d80e55e5
https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/551878750
After adding a sanity check for the internal leveldb, I discovered that the empty key is not allowed by leveldb.
However because DB.ts
puts all data underneath the data
level path, any empty key is simply the key of just the level path itself without a terminal part. Thus while the internal leveldb doesn't support empty keys, this DB.ts
will end up supporting empty keys.
@emmacasolin @tegefaulkes
Ok so once levels are introduced, the sort order is a lot more complicated and cannot be easily compared with Buffer.compare
.
Here is the sort order when levels are introduced and also empty keys are introduced coming straight out of DB iterator:
[
[ <Buffer > ],
[ <Buffer >, <Buffer > ],
[ <Buffer 00>, <Buffer > ],
[ <Buffer 00>, <Buffer 00> ],
[ <Buffer 00 00>, <Buffer > ],
[ <Buffer 00 00>, <Buffer >, <Buffer > ],
[ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
[ <Buffer 00 00>, <Buffer 00 00> ],
[ <Buffer 00 00>, <Buffer 01> ],
[ <Buffer 00 00 00>, <Buffer 00> ],
[ <Buffer 00 00 00>, <Buffer 01> ],
[ <Buffer 01>, <Buffer 00> ],
[ <Buffer ff>, <Buffer > ],
[ <Buffer 00> ],
[ <Buffer 00 00> ],
[ <Buffer 01> ]
]
It can be described with 3 constraints:
Note that ['a', 'b']
is a key path of degree 2.
This behaviour makes sense.
However the only weird thing is the fact that empty keys are always put ahead of key paths with degree n + 1. Take this snippet:
[ <Buffer 00 00>, <Buffer > ],
[ <Buffer 00 00>, <Buffer >, <Buffer > ],
[ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
[ <Buffer 00 00>, <Buffer 00 00> ],
[ <Buffer 00 00>, <Buffer 01> ],
I would argue it would have been more obvious if it was this way instead:
[ <Buffer 00 00>, <Buffer >, <Buffer > ],
[ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
[ <Buffer 00 00>, <Buffer > ],
[ <Buffer 00 00>, <Buffer 00 00> ],
[ <Buffer 00 00>, <Buffer 01> ],
We would then not need an exception for constraint 3.
I suspect the problem is because the encoded data looks like this:
0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 <encoded key>
0x00 <encoded level> 0x00 <encoded key>
We know that a
is sorted before aa
.
We would only be able to change it if we were to substitute the empty part for a special symbol.
The baseN encoding algorithm does allow for other symbols to be used. For example base64
uses =
to indicate padding. We can assign a special symbol for the empty key that is outside of the alphabet, and isn't 0x00
. Most likely it would have 0x01
.
Right now our alphabet is starts at 0x01
. We disallow the presence of 0x00
because that's used for the separator. If we used 0x01
as the special empty symbol, then pushed our alphabet to start at 0x02
, then 0x01
would always be placed first relative to other keys, but would be placed after key paths with degree n + 1.
The result should look like:
0x00 <encoded level> 0x00 0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 0x01
0x00 <encoded level> 0x00 <encoded key>
0x00 <encoded level> 0x00 <encoded key>
I believe this should also apply to level parts that are empty.
Seems like a quickfix.
It appears to work. This is now the order with multiple parts and empty parts:
[
[ <Buffer >, <Buffer > ],
[ <Buffer 00>, <Buffer > ],
[ <Buffer 00>, <Buffer 00> ],
[ <Buffer 00 00>, <Buffer >, <Buffer > ],
[ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
[ <Buffer 00 00>, <Buffer > ],
[ <Buffer 00 00>, <Buffer 00 00> ],
[ <Buffer 00 00>, <Buffer 01> ],
[ <Buffer 00 00 00>, <Buffer 00> ],
[ <Buffer 00 00 00>, <Buffer 01> ],
[ <Buffer 01>, <Buffer 00> ],
[ <Buffer ff>, <Buffer > ],
[ <Buffer > ],
[ <Buffer 00> ],
[ <Buffer 00 00> ],
[ <Buffer 01> ]
]
Compare with previous:
[
[ <Buffer > ],
[ <Buffer >, <Buffer > ],
[ <Buffer 00>, <Buffer > ],
[ <Buffer 00>, <Buffer 00> ],
[ <Buffer 00 00>, <Buffer > ],
[ <Buffer 00 00>, <Buffer >, <Buffer > ],
[ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
[ <Buffer 00 00>, <Buffer 00 00> ],
[ <Buffer 00 00>, <Buffer 01> ],
[ <Buffer 00 00 00>, <Buffer 00> ],
[ <Buffer 00 00 00>, <Buffer 01> ],
[ <Buffer 01>, <Buffer 00> ],
[ <Buffer ff>, <Buffer > ],
[ <Buffer 00> ],
[ <Buffer 00 00> ],
[ <Buffer 01> ]
]
There's another behaviour here which is that keypath of degree n is generally placed ahead of keypaths of degree n-1.
It seems like it would make more sense to be other way around:
[
[ <Buffer > ],
[ <Buffer 00> ],
[ <Buffer 00 00> ],
[ <Buffer 01> ],
[ <Buffer >, <Buffer > ],
[ <Buffer 00>, <Buffer > ],
[ <Buffer 00>, <Buffer 00> ],
[ <Buffer 00 00>, <Buffer > ],
[ <Buffer 00 00>, <Buffer 00 00> ],
[ <Buffer 00 00>, <Buffer 01> ],
[ <Buffer 00 00 00>, <Buffer 00> ],
[ <Buffer 00 00 00>, <Buffer 01> ],
[ <Buffer 01>, <Buffer 00> ],
[ <Buffer ff>, <Buffer > ]
[ <Buffer 00 00>, <Buffer >, <Buffer > ],
[ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
]
The reason for this is because deeper degrees have the 0x00
as the separator, which is always lower than whatever encoded data.
One way to flip this around is to make the separator the highest byte rather than the lowest byte.
That is the empty part can be 0x00
to mean lowest byte. While the separator could be 0xff
the highest byte. While the alphabet can be inbetween. Let's see how this goes.
I couldn't actually use 0xff
because of iteration's need to use the options._lt = levelkKeyEnd
.
The levelKeyEnd
had to be 1 decimal point higher. So instead of 0xff
, the separator can be 0xfe
.
Actually the 0xfe
doesn't quite work either. It ends up making shorter level parts later in the order. Because 0xfe
is higher than any symbol in the encoded part.
Ok well we can stick with the state at: https://github.com/MatrixAI/js-db/pull/31#issuecomment-1141929816.
I found a different bug, if we use raw: true
and put in an empty buffer as a value, it actually fails. I found this by fuzzing the entire get and put too.
Not possible to decrypt for some reason. Possibly because decrypted
is null
if it's an empty buffer.
The empty string is fine because it gets JSON encoded.
The solution was simple, a bug in the decrypt
utility.
if (cipherTextBuf.byteLength <= 32) {
return;
}
Should have been:
if (cipherTextBuf.byteLength < 32) {
return;
}
If it's exactly 32
bytes that means it was the empty buffer being encrypted.
This fix needs to be patched in EFS and in PK as well @tegefaulkes @emmacasolin!
Pipeline Attempt on 552152714 for 5c1dbcbf6aef84ba7e63de1ca29e7ea61c3787bf
https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/552152714
Pipeline Attempt on 552165297 for ea2e8ed9941cf91f2466adaac560948c02b49440
https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/552165297
Apparently macos cannot import node:crypto
, that's pretty strange. Reverting back to directly importing crypto
.
Pipeline Attempt on 552171993 for eeb9b1b5fb87737c689a1b9b0d0ca7ed4e145f4a
https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/552171993
@emmacasolin @tegefaulkes this will create 4.0.4 automatically if successful.
Pipeline Succeeded on 552171993 for eeb9b1b5fb87737c689a1b9b0d0ca7ed4e145f4a
https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/552171993
This is an automatic PR generated by the pipeline CI/CD. This will be automatically fast-forward merged if successful.