Closed CMCDragonkai closed 3 years ago
class Left {
public [Symbol.toPrimitive](hint: 'string' | 'number' | 'default') {
return 'a';
}
}
class Right {
public [Symbol.toPrimitive](hint: 'string' | 'number' | 'default') {
return 'b';
}
}
const left = new Left;
const right = new Right;
console.log(left < right);
console.log(left <= right);
console.log(left > right);
console.log(left >= right);
The above shows that hint
will be number
on these comparisons, but toPrimitive
can return a string instead. The hint
is just a hint
. You don't have to abide by it. Then the result is that they are "cast" to 'a' < 'b'
. Which in the case of string comparison is correct.
If compareFunction is not supplied, all non-undefined array elements are sorted by converting them to strings and comparing strings in UTF-16 code units order. For example, "banana" comes before "cherry".
Note: In UTF-16, Unicode characters above
\uFFFF
are encoded as two surrogate code units, of the range\uD800-\uDFFF
. The value of each code unit is taken separately into account for the comparison. Thus the character formed by the surrogate pair\uD655\uDE55
will be sorted before the character \uFF3A.
So it's the value of each "code unit". Each code unit in UTF 16 may be 2 bytes. If we convert our strings as binary strings.
However when using Buffer.from(...).toString('binary')
this is an alias for the latin1
encoding. The node docs say:
'latin1': Latin-1 stands for ISO-8859-1. This character encoding only supports the Unicode characters from U+0000 to U+00FF. Each character is encoded using a single byte. Characters that do not fit into that range are truncated and will be mapped to characters in that range.
This is basically ASCII or more appropriately https://en.wikipedia.org/wiki/ISO/IEC_8859-1.
In terms of encoding the buffer, the buffer is already single bytes.
I'm not sure what it means to encode into latin1
string, and then comparing the string during a sort when it says it uses UTF16 code points.
Reading this: https://kevin.burke.dev/kevin/node-js-string-encoding/ means that JS strings are always encoded with UTF16. However the runtime appears to do alot of automatic conversions. So for most inputs into a JS program, it's expected that inputs will be in UTF-8. However internally I believe it is utf16. When you do Buffer.from(s, 'utf8')
or Buffer.from(s, 'utf16le')
they both work because JS knows that the string is utf16 encoded, and will translate it to utf8
or utf16le
on the fly.
How does this impact us? Well when we return a binary string from of an ID. Whatever encoding we choose, we should check that the string length is ultimately 16 to mean 16 bytes, I think this will work because latin1
or binary
encoding is 8 bit ascii, and that will cover the full range. I wonder though, if that means the the string will be translated to utf16 and stored as utf16.
During a sort, if it considers the string in utf16 codepoints, then my idea that it would compare on the individual byte numbers isn't how it works. The concern would be whether it would result in a codepoint that is out of order from the bit numbering scheme in the id.
Regarding operator overloading, TS has some problems:
This means we get type errors when we try to use then as indexes:
class Left {
public [Symbol.toPrimitive](hint: 'string' | 'number' | 'default') {
return 'a';
}
}
const obj = {};
// @ts-ignore
obj[left] = 1;
Funnily enough the comparison operators work.
It seems the only way is with explicit typecasts like left as unknown as string
.
One way to work around this is by making an intersection type:
type Id = IdInternal & string;
Then the idea is that we force it with a smart constructor:
function makeId(...args): Id {
return new IdInternal(...args) as Id;
}
This means from the outside, users of the id will appear like a string.
However type inference will think it also has all the string methods, which it won't, it would only if casted approporiately.
Sticking with Uint8Array
for now, since we only need to figure out how to encode Uint8Array
to binary string https://developer.mozilla.org/en-US/docs/Web/API/DOMString/Binary.
Specification
Sometimes the ID is used as a string when used in POJO objects or ES6 maps. In these cases, an
ArrayBuffer
is not easily turned into a string.What we can do is to make use of ideas from https://javascript.info/object-toprimitive. This will enable us the ability to convert it to primitives.
There are 2 main primitives strings or numbers. I don't believe there is a proper numeric representation of Ids. This is because the Ids are 128 bits, and won't fit into a JS number. And even
BigInt
is only 64 bits. It would only be possible by truncating the 128 bits into a number. This could be done by usingnew Float64Array(2)
and putting all 128 bits into that. But again it wouldn't really mean much. Except perhaps by understanding the first 64 bits as a floating point number (of which the last bit may make the numbers negative).So for now, we can instead represent numbers as
NaN
. And this is the case withArrayBuffer
like+ab
isNaN
.More useful is the string representation. The 2 hints that lead to string primitive is the
string
hint and thedefault
hint and also thetoString()
call.If the binary string version of the 128 bit identifier can be sorted in the same way that
Buffer.compare
does it, then this could be done.It would be ideal that we could do
id1 > id2
too, but this uses thenumber
hint.So basically we can try:
Buffer.compare
id1 > id2
could be achievedclass Id extends ArrayBuffer
could be used, but some of these operations will require direct access, which meansUint8Array
would be preferreddevDependency
when bundling, then it's also possible to useimport { Buffer } from 'buffer';
and this can simplify our comparisons and integration into the rest of PKAdditional context
Buffer
isUint8Array
andUint8Array
isArrayBuffer
, soBuffer
is the most flexible. However there is the issue of detached array buffers. Node buffers aren't able to be detached: https://github.com/MatrixAI/js-polykey/issues/2202 - if we use
Buffer
, that impacts the goal to make ES compliant, but those are separate concerns...Tasks
Symbol.toPrimitive
,toString
andvalueOf
ArrayBuffer
, orUint8Array
orBuffer
if it makes it easier...