MatrixAI / js-id

ID generation for JavaScript & TypeScript Applications
https://polykey.com
Apache License 2.0
10 stars 1 forks source link

Changing to Id derivative of Uint8Array in order to provide operator overloading #6

Closed CMCDragonkai closed 2 years ago

CMCDragonkai commented 2 years ago

Description

@tegefaulkes

Changes the id type from ArrayBuffer to Id. The Id is an extension of Uint8Array, but with additional TS hack making TS think it's also like a string.

This enables things like:

import IdSortable from './src/IdSortable';
import IdDeterministic from './src/IdDeterministic';

const idSort = new IdSortable();

const id1 = idSort.get();
const id2 = idSort.get();

console.log(id1 < id2); // true
console.log(id1 > id2); // false 

// POJO key access type casts it as a string
const record = {};
record[id1] = 1;
record[id2] = 2;
console.log(record[id1]);

// explicit toString conversion, this is not the same as Uint8Array.toString
console.log(id1.toString()); // this is a binary encoded string
console.log(id1.toString().length); // 16

console.log(id1 == id2); // false (but because it is an object comparison)
console.log(id1.toString() == id2); // this will cast to string and still they are not the same

const idDet = new IdDeterministic('bar');

const idfoo1 = idDet.get('foo');
const idfoo2 = idDet.get('foo');

console.log(idfoo1 == idfoo2); // false (but because it is an object comparison)
console.log(idfoo1.toString() == idfoo2); // true
console.log(idfoo1 == idfoo2.toString()); // true

Because Id is a Uint8Array and all Uint8Array are ArrayBuffer, wherever ArrayBuffer is needed, the Id can be substituted. This makes Id a more "flexible" type.

Beware that TS thinks id is also a string, but it's not really. So limit automatic typecasting to the above situations.

Also these utility functions changed:

function toUUID(id: Id): string;
function fromUUID(uuid: string): Id | undefined;
function toMultibase(id: Id, format: MultibaseFormats): string;
function fromMultibase(idString: string): Id | undefined;

Right now the primitive string from is a "binary string" as described by https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary. This is probably the least expensive encoding compared to uuid or multibase. It's very raw.

I will need to do something like:

toString(id: Id): string;
fromString(idString: string): Id | undefined;

In the utils so that way it will use those conversions too.

Issues Fixed

Tasks

  1. [x] - New IdInternal class and Id type
  2. [x] - Porting over IdSortable, IdDeterministic, IdRandom
  3. [x] - Porting over tests (note that utility functions are now using Id)

Final checklist

CMCDragonkai commented 2 years ago

Currently the implementation for binary string encoding is:

function toString(id: Id): string {
  const b = Buffer.from(id.buffer, id.byteOffset, id.byteLength);
  return b.toString('binary');
}

function fromString(idString: string): Id | undefined {
  const b = Buffer.from(idString, 'binary');
  return IdInternal.create(b.buffer, b.byteOffset, b.byteLength);
}

We need to try to get this working without the help of Node buffers.

Need to search how to convert Uint8Array to binary string and back.

CMCDragonkai commented 2 years ago

I might want to disambiguate binary from bit. So all my bin2X functions should be bit2X instead. So that bit strings are not the same as binary strings.

CMCDragonkai commented 2 years ago

Seems like we could do something like:

  return String.fromCharCode(...id);
CMCDragonkai commented 2 years ago

For toString this works with just:

function toString(id: Id): string {
  return String.fromCharCode(...id);
}

However for fromString:

function fromString(idString: string): Id | undefined {
  if (idString.length !== 16) return;
  const id = IdInternal.create(16);
  for (let i = 0; i < 16; i++) {
    id[i] = idString.charCodeAt(i);
  }
  return id;
}

The above should work. Except now we have another TS type error originating from the hack of using IdInternal & string.

The problem is that string primitive type has it's own index signature and this conflicts with the index signature of Uint8Array.

When they conflict, the resulting inference is that id[i] is type never, not number. You have to do id[i] as number.

I've tried a few different techniques like with (https://stackoverflow.com/questions/51465182/how-to-remove-index-signature-using-mapped-types):

type RemoveIndex<T> = {
  [ K in keyof T as string extends K ? never : number extends K ? never : K ] : T[K]
};

type RemoveIndex2<T> = {
  [K in keyof T as {} extends Record<K, 1> ? never : K]: T[K]
}

type X = IdInternal & RemoveIndex<string>;

But those don't work on the primitive string. They do work on the String interface, but not on string primitive type. If I use String instead, this prevents the usage on obj[id] as only primitives can be used as index types.

The only primitive type that does work is number. So type Id = IdInternal & number, however this is just wrong cause it's not actually a number that's being presented by the toPrimitive, it's a string. If we could have a 128 bit number, yea it could work, but it's not the case atm.

CMCDragonkai commented 2 years ago

Ok it might work if I use number. It's technically wrong, but it's works.

The number type interface is Number, and it only has a few methods that doesn't conflict:

interface Number {
    /**
     * Returns a string representation of an object.
     * @param radix Specifies a radix for converting numeric values to strings. This value is only used for numbers.
     */
    toString(radix?: number): string;

    /**
     * Returns a string representing a number in fixed-point notation.
     * @param fractionDigits Number of digits after the decimal point. Must be in the range 0 - 20, inclusive.
     */
    toFixed(fractionDigits?: number): string;

    /**
     * Returns a string containing a number represented in exponential notation.
     * @param fractionDigits Number of digits after the decimal point. Must be in the range 0 - 20, inclusive.
     */
    toExponential(fractionDigits?: number): string;

    /**
     * Returns a string containing a number represented either in exponential or fixed-point notation with a specified number of digits.
     * @param precision Number of significant digits. Must be in the range 1 - 21, inclusive.
     */
    toPrecision(precision?: number): string;

    /** Returns the primitive value of the specified object. */
    valueOf(): number;
}

So tricking TS into thinking the Id is a number works as well.

Note that Omit<number, 'toPrecision'> doesn't work. It loses the idea that the type is actually a primitive.

CMCDragonkai commented 2 years ago

I just realised that bigints are not 64bit, they are arbitrary sized. In that sense perhaps we can say that it's a & bigint instead and instead of returning a string, we do really return a 128bit number.

Unfortunately the bigint cannot be used as an index type atm. An explicit cast is expected.

CMCDragonkai commented 2 years ago

I wanted to compare our binary encoding to what's used in multibase with their identity encoding. Internally they are using TextEncoder and TextDecoder, however I don't think it's accurate due to: https://github.com/multiformats/js-multiformats/issues/122

CMCDragonkai commented 2 years ago

New tests were added for string equality, comparison and also record index usage.