MatrixAI / js-db

Key-Value DB for TypeScript and JavaScript Applications
https://polykey.com
Apache License 2.0
5 stars 0 forks source link

ci: merge staging to master #31

Closed MatrixAI-Bot closed 2 years ago

MatrixAI-Bot commented 2 years ago

This is an automatic PR generated by the pipeline CI/CD. This will be automatically fast-forward merged if successful.

MatrixAI-Bot commented 2 years ago

Pipeline Attempt on 551843671 for 0a0901d07862fa255104a8d3e28530e89f478ca2

https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/551843671

ghost commented 2 years ago
👇 Click on the image for a new way to code review - Make big changes easier — review code in small groups of related files - Know where to start — see the whole change at a glance - Take a code tour — explore the change with an interactive tour - Make comments and review — all fully sync’ed with github [Try it now!](https://app.codesee.io/r/reviews?pr=31&src=https%3A%2F%2Fgithub.com%2FMatrixAI%2Fjs-db)

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

MatrixAI-Bot commented 2 years ago

Pipeline Attempt on 551878750 for de7eead5194aef861a37e7df30a52f95d80e55e5

https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/551878750

CMCDragonkai commented 2 years ago

After adding a sanity check for the internal leveldb, I discovered that the empty key is not allowed by leveldb.

However because DB.ts puts all data underneath the data level path, any empty key is simply the key of just the level path itself without a terminal part. Thus while the internal leveldb doesn't support empty keys, this DB.ts will end up supporting empty keys.

CMCDragonkai commented 2 years ago

@emmacasolin @tegefaulkes

Ok so once levels are introduced, the sort order is a lot more complicated and cannot be easily compared with Buffer.compare.

Here is the sort order when levels are introduced and also empty keys are introduced coming straight out of DB iterator:

    [
      [ <Buffer > ],
      [ <Buffer >, <Buffer > ],
      [ <Buffer 00>, <Buffer > ],
      [ <Buffer 00>, <Buffer 00> ],
      [ <Buffer 00 00>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer >, <Buffer > ],
      [ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer 00 00> ],
      [ <Buffer 00 00>, <Buffer 01> ],
      [ <Buffer 00 00 00>, <Buffer 00> ],
      [ <Buffer 00 00 00>, <Buffer 01> ],
      [ <Buffer 01>, <Buffer 00> ],
      [ <Buffer ff>, <Buffer > ],
      [ <Buffer 00> ],
      [ <Buffer 00 00> ],
      [ <Buffer 01> ]
    ]

It can be described with 3 constraints:

Note that ['a', 'b'] is a key path of degree 2.

This behaviour makes sense.

However the only weird thing is the fact that empty keys are always put ahead of key paths with degree n + 1. Take this snippet:

      [ <Buffer 00 00>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer >, <Buffer > ],
      [ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer 00 00> ],
      [ <Buffer 00 00>, <Buffer 01> ],

I would argue it would have been more obvious if it was this way instead:

      [ <Buffer 00 00>, <Buffer >, <Buffer > ],
      [ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer 00 00> ],
      [ <Buffer 00 00>, <Buffer 01> ],

We would then not need an exception for constraint 3.

I suspect the problem is because the encoded data looks like this:

0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 <encoded key>
0x00 <encoded level> 0x00 <encoded key>

We know that a is sorted before aa.

We would only be able to change it if we were to substitute the empty part for a special symbol.

The baseN encoding algorithm does allow for other symbols to be used. For example base64 uses = to indicate padding. We can assign a special symbol for the empty key that is outside of the alphabet, and isn't 0x00. Most likely it would have 0x01.

Right now our alphabet is starts at 0x01. We disallow the presence of 0x00 because that's used for the separator. If we used 0x01 as the special empty symbol, then pushed our alphabet to start at 0x02, then 0x01 would always be placed first relative to other keys, but would be placed after key paths with degree n + 1.

The result should look like:

0x00 <encoded level> 0x00 0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 0x00 <encoded level> 0x00
0x00 <encoded level> 0x00 0x01
0x00 <encoded level> 0x00 <encoded key>
0x00 <encoded level> 0x00 <encoded key>

I believe this should also apply to level parts that are empty.

Seems like a quickfix.

CMCDragonkai commented 2 years ago

It appears to work. This is now the order with multiple parts and empty parts:

    [
      [ <Buffer >, <Buffer > ],
      [ <Buffer 00>, <Buffer > ],
      [ <Buffer 00>, <Buffer 00> ],
      [ <Buffer 00 00>, <Buffer >, <Buffer > ],
      [ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer 00 00> ],
      [ <Buffer 00 00>, <Buffer 01> ],
      [ <Buffer 00 00 00>, <Buffer 00> ],
      [ <Buffer 00 00 00>, <Buffer 01> ],
      [ <Buffer 01>, <Buffer 00> ],
      [ <Buffer ff>, <Buffer > ],
      [ <Buffer > ],
      [ <Buffer 00> ],
      [ <Buffer 00 00> ],
      [ <Buffer 01> ]
    ]

Compare with previous:

    [
      [ <Buffer > ],
      [ <Buffer >, <Buffer > ],
      [ <Buffer 00>, <Buffer > ],
      [ <Buffer 00>, <Buffer 00> ],
      [ <Buffer 00 00>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer >, <Buffer > ],
      [ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer 00 00> ],
      [ <Buffer 00 00>, <Buffer 01> ],
      [ <Buffer 00 00 00>, <Buffer 00> ],
      [ <Buffer 00 00 00>, <Buffer 01> ],
      [ <Buffer 01>, <Buffer 00> ],
      [ <Buffer ff>, <Buffer > ],
      [ <Buffer 00> ],
      [ <Buffer 00 00> ],
      [ <Buffer 01> ]
    ]
CMCDragonkai commented 2 years ago

There's another behaviour here which is that keypath of degree n is generally placed ahead of keypaths of degree n-1.

It seems like it would make more sense to be other way around:

    [
      [ <Buffer > ],
      [ <Buffer 00> ],
      [ <Buffer 00 00> ],
      [ <Buffer 01> ],
      [ <Buffer >, <Buffer > ],
      [ <Buffer 00>, <Buffer > ],
      [ <Buffer 00>, <Buffer 00> ],
      [ <Buffer 00 00>, <Buffer > ],
      [ <Buffer 00 00>, <Buffer 00 00> ],
      [ <Buffer 00 00>, <Buffer 01> ],
      [ <Buffer 00 00 00>, <Buffer 00> ],
      [ <Buffer 00 00 00>, <Buffer 01> ],
      [ <Buffer 01>, <Buffer 00> ],
      [ <Buffer ff>, <Buffer > ]
      [ <Buffer 00 00>, <Buffer >, <Buffer > ],
      [ <Buffer 00 00>, <Buffer ff>, <Buffer > ],
    ]

The reason for this is because deeper degrees have the 0x00 as the separator, which is always lower than whatever encoded data.

One way to flip this around is to make the separator the highest byte rather than the lowest byte.

That is the empty part can be 0x00 to mean lowest byte. While the separator could be 0xff the highest byte. While the alphabet can be inbetween. Let's see how this goes.

CMCDragonkai commented 2 years ago

I couldn't actually use 0xff because of iteration's need to use the options._lt = levelkKeyEnd.

The levelKeyEnd had to be 1 decimal point higher. So instead of 0xff, the separator can be 0xfe.

CMCDragonkai commented 2 years ago

Actually the 0xfe doesn't quite work either. It ends up making shorter level parts later in the order. Because 0xfe is higher than any symbol in the encoded part.

Ok well we can stick with the state at: https://github.com/MatrixAI/js-db/pull/31#issuecomment-1141929816.

CMCDragonkai commented 2 years ago

I found a different bug, if we use raw: true and put in an empty buffer as a value, it actually fails. I found this by fuzzing the entire get and put too.

Not possible to decrypt for some reason. Possibly because decrypted is null if it's an empty buffer.

The empty string is fine because it gets JSON encoded.

CMCDragonkai commented 2 years ago

The solution was simple, a bug in the decrypt utility.

  if (cipherTextBuf.byteLength <= 32) {
    return;
  }

Should have been:

  if (cipherTextBuf.byteLength < 32) {
    return;
  }

If it's exactly 32 bytes that means it was the empty buffer being encrypted.

This fix needs to be patched in EFS and in PK as well @tegefaulkes @emmacasolin!

MatrixAI-Bot commented 2 years ago

Pipeline Attempt on 552152714 for 5c1dbcbf6aef84ba7e63de1ca29e7ea61c3787bf

https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/552152714

MatrixAI-Bot commented 2 years ago

Pipeline Attempt on 552165297 for ea2e8ed9941cf91f2466adaac560948c02b49440

https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/552165297

CMCDragonkai commented 2 years ago

Apparently macos cannot import node:crypto, that's pretty strange. Reverting back to directly importing crypto.

MatrixAI-Bot commented 2 years ago

Pipeline Attempt on 552171993 for eeb9b1b5fb87737c689a1b9b0d0ca7ed4e145f4a

https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/552171993

CMCDragonkai commented 2 years ago

@emmacasolin @tegefaulkes this will create 4.0.4 automatically if successful.

MatrixAI-Bot commented 2 years ago

Pipeline Succeeded on 552171993 for eeb9b1b5fb87737c689a1b9b0d0ca7ed4e145f4a

https://gitlab.com/MatrixAI/open-source/js-db/-/pipelines/552171993