MatrixAI / js-db

Key-Value DB for TypeScript and JavaScript Applications
https://polykey.com
Apache License 2.0
5 stars 0 forks source link

Use `bufferWrap` to support `ArrayBuffer`, `Uint8Array` and `Buffer` #3

Open CMCDragonkai opened 2 years ago

CMCDragonkai commented 2 years ago

Specification

The ArrayBuffer type is more generalised in the JS ecosystem. The DB right now focuses on taking Node Buffers.

Node buffers can be converted to ArrayBuffer easily, and ArrayBuffer can be wrapped in Node buffers.

According to this comment: https://github.com/Level/level-js/issues/34#issuecomment-397811192 it is possible to use ArrayBuffers directly in leveldb, it just not documented.

It says that the asBuffer has to be turned false.

I'm not sure how this would work with keyEncoding and valueEncoding that we have set to 'binary'.

Our raw is also specifying that we would get Buffer, would that mean we instead say that we return ArrayBuffer instead, and with keys it would be string | ArrayBuffer?

Furthermore this would impact js-encryptedfs, so it's worthwhile to explore the implications of this.

The primary reason to do this would be cross-platform compatibility for mobile platforms that may not have the Node buffer.

The alternative would be use https://github.com/feross/buffer everywhere as a dependency so that way everything just uses the buffer library. This will mean that any usage of import { Buffer } from 'buffer'; will be resolved by feross/buffer first though, so one should beware of that.

Note that all Node Buffer is Uint8Array which is ArrayBuffer.

If something supports ArrayBuffer, they would support Uint8Array and Buffer at the same time.

One benefit would be integration of js-id https://github.com/MatrixAI/js-id/issues/1 can be simplified since Id as Uint8Array can be stored directly in the DB without first wrapping them as Buffer.

Additional context

Tasks

  1. [x] - Investigate ArrayBuffer compatibility in leveldb
  2. [x] - Investigate to what extent can we enable compatibility with ArrayBuffer in DB without losing Buffer support
  3. [ ] - Replace all uses of string | Buffer with string | ArrayBuffer for keys, but retain Buffer on returned output.
  4. [ ] - Change this.crypto.key to be assumed to be ArrayBuffer
  5. [ ] - Change toArrayBuffer to an explicit slice copy of all possible instances of ArrayBuffer type
  6. [ ] - The fromArrayBuffer is still useful since we return Buffer.
  7. [ ] - Change serialize and deserialize to use ArrayBuffer and TextEncoder and TextDecoder which can do the utf-8 encoding/decoding.
  8. [ ] - Check if PK is using any of the utility functions here, some uses of deserialize and serialize might need to change.
  9. [ ] - Ensure that most signatures are still compatible
  10. [ ] - Update all tests to include Uint8Array and ArrayBuffer usage, and ensure that slice-copying is working.
  11. [ ] - Check that there's no performance regression on the benchmarks
CMCDragonkai commented 2 years ago

Note that node module resolution would load locally first before loading from anywhere else.

I'm not sure how this applies to non-node runtimes like in NativeScript. Recommend not doing anything here until we have understood the platform properly.

The usage of ArrayBuffer would be a standardisation across other JS libraries and anything claiming to support ecmascript remember.

CMCDragonkai commented 2 years ago

Note that all Node Buffer is Uint8Array which is ArrayBuffer.

If something supports ArrayBuffer, they would support Uint8Array and Buffer at the same time.

We are also using alot of Uint8Array where Buffer was normally used. But ArrayBuffer is always useful from a "storage" POV. And returning Uint8Array means you always returning something that can be ArrayBuffer too either by using it directly or with bytes.buffer.

Therefore we can be satisfied with supporting Uint8Array as well. But maximum compatibility is going to be the ArrayBuffer.

CMCDragonkai commented 2 years ago

The type we use in js-db is LevelDB<string | Buffer, Buffer>. This means the values returned satisfy the interface of Buffer, Uint8Array and ArrayBuffer all together.

As for string | Buffer in src/DB.ts, if it works just by changing Buffer to ArrayBuffer, then this would solve this issue, and one could pass Buffer, ArrayBuffer or Uint8Array as the key.

CMCDragonkai commented 2 years ago

The only issue would be the domainPath that uses prefixer. Does that work with just ArrayBuffer, or would that need Uint8Array?

CMCDragonkai commented 2 years ago

No it doesn't because it uses Buffer.isBuffer to check. It throws this error Error: key should be a string or a buffer if it is not Buffer.

However one could wrap it in our domainPath function. We should be able to zero-copy it into a Node Buffer.

CMCDragonkai commented 2 years ago

The domainPath can be resolved by this change:

function domainPath(levels: DBDomain, key: string | ArrayBuffer): string | Buffer {
  // prefixer only takes string or Node Buffer
  // convert key into key_ which wraps ArrayBuffer as Buffer
  let key_: string | Buffer;
  if (typeof key === 'string') {
    key_ = key;
  } else if (ArrayBuffer.isView(key)) {
    key_ = Buffer.from(key.buffer, key.byteOffset, key.byteLength);
  } else {
    key_ = Buffer.from(key);
  }
  if (!levels.length) {
    return key_;
  }
  let prefix = key_;
  for (let i = levels.length - 1; i >= 0; i--) {
    prefix = prefixer(levels[i], prefix);
  }
  return prefix;
}

However this doesn't fix everything. We also use Buffer for this.crypto.key and there are complications with when using worker manager.

When using worker manager, we want to copy the ArrayBuffer, but if the actual instance is Node Buffer, the slice call creates a view, not a copy. So we have to distinguish by checking with Buffer.is and ArrayBuffer.isView and doing a set or copy or slice call. Ensuring that we are in fact copying the ArrayBuffer.

But yes this issue looks like it is totally possible.

CMCDragonkai commented 2 years ago

Note that this would only present an ArrayBuffer interface to the outside world. Internally Node Buffer would still be used by the domainPath. And of course LevelDB still returns Buffer. So primarily this change would be about increasing the compatibility of DB as we progressively move towards more usage of Uint8Array and ArrayBuffer such as js-id and webcrypto.

CMCDragonkai commented 2 years ago

The TextDecoder works on ArrayBuffer and Uint8Array.

[nix-shell:~/Projects/js-db]$ node
Welcome to Node.js v14.17.3.
Type ".help" for more information.
> u = new Uint8Array([0x61, 0x62, 0x63])
Uint8Array(3) [ 97, 98, 99 ]
> u.buffer
ArrayBuffer { [Uint8Contents]: <61 62 63>, byteLength: 3 }
> new TextDecoder().decode(u.buffer)
'abc'
> new TextDecoder().decode(u)
'abc'
CMCDragonkai commented 2 years ago

Need to ensure that this doesn't result in a performance regression with respect to benchmarks.

CMCDragonkai commented 2 years ago

https://github.com/Level/subleveldown/issues/111 - seems like we should be able to use Buffer, further checking required.

CMCDragonkai commented 2 years ago

This is somewhat more complicated now that we have alot of functions relying on the Buffer interface.

Applications that use this library right now has to use the feross buffer package if their runtime environment doesn't have the Buffer class.

Having an ArrayBuffer compatible system should make this alot more compatible, however unless this package is made generic over the underlying DB, that is browser indexeddb or node leveldb, then doing this doesn't really change the portability alot. However it does enable the usage of our system to be bit easier, so that when downstream libraries like PK uses ArrayBuffer, they don't need to convert it to a Buffer just so that it can be work inside the DB.

Although the new abstract-level has support for Uint8Array, so that may be a better way to support it https://github.com/Level/browser-level/blob/98c13d4356fdb6ca5668ddd35859e6c9b326b5cb/UPGRADING.md#uint8array-support rather than directly supporting ArrayBuffer.

CMCDragonkai commented 2 years ago

We are no longer attached to abstract-level anymore. We are directly using rocksdb. As it forces the NAPI API, we might as well stick with buffer. Any need to use ArrayBuffer should just use the feross buffer.

The js-db will likely have native mobile bindings later, in which case even there I'd say we should continue using feross buffer. The ArrayBufferis just quite low level.

One way to quickly solve this issue is to just take ArrayBuffer type, and if it is an array buffer, wrap it up as a Buffer automatically. That way it avoids users having to wrap it themselves. And opens us up to directly using ArrayBuffer in the future.

CMCDragonkai commented 1 year ago

If we receive an ArrayBuffer. All we need to do is Buffer.from(arrayBuffer). That's it. That will be zero-copy and just wrap the buffer.

If we use bufferWrap as in Polykey, then we can also support Uint8Array.

So let's just do that. Here's the bufferWrap function.

/**
 * Zero-copy wraps ArrayBuffer-like objects into Buffer
 * This supports ArrayBuffer, TypedArrays and the NodeJS Buffer
 */
function bufferWrap(
  array: BufferSource,
  offset?: number,
  length?: number,
): Buffer {
  if (Buffer.isBuffer(array)) {
    return array;
  } else if (ArrayBuffer.isView(array)) {
    return Buffer.from(
      array.buffer,
      offset ?? array.byteOffset,
      length ?? array.byteLength,
    );
  } else {
    return Buffer.from(array, offset, length);
  }
}
CMCDragonkai commented 1 year ago

This means when we are passing Ids into the DB, we no longer have to do const idBuffer = id.toBuffer();.

CMCDragonkai commented 11 months ago

@amydevs Since in js-db we are planning to continue using Buffer, you can use it in js-ws as well, and just wrap your ArrayBuffer as a regular buffer.

Do this: https://github.com/MatrixAI/js-db/issues/3#issuecomment-1280289006 when you check instanceof ArrayBuffer.

CMCDragonkai commented 11 months ago

Use the bufferWrap:

function bufferWrap(
  array: BufferSource,
  offset?: number,
  length?: number,
): Buffer {
  if (Buffer.isBuffer(array)) {
    return array;
  } else if (ArrayBuffer.isView(array)) {
    return Buffer.from(
      array.buffer,
      offset ?? array.byteOffset,
      length ?? array.byteLength,
    );
  } else {
    return Buffer.from(array, offset, length);
  }
}

Now the issue is that Buffer is not a regular type in browsers. But we deal with this by introducing feross/buffer as the polyfill.

However this should only be done in the application context. So either the application remembers to bundle up with Buffer when using such libraries. OR... the package.json could specify feross/buffer as some sort of polyfill-dependency? Not sure, could use peerDependencies.

CMCDragonkai commented 11 months ago

Actually for js-ws it makes more sense to always convert to Uint8Array and not deal with node buffers.

So that means if it is an ArrayBuffer, you can do new Uint8Array(ab). If it is node buffer... it is already a Uint8Array.