Light Mode | Dark Mode |
---|---|
Rando is a new serialization format optimized for fast random access of unstructured data.
JS | JSON | Rando | Comment |
---|---|---|---|
0 |
0 |
+ |
Integers |
-1 |
-1 |
1+ |
|
1 |
1 |
2+ |
|
-25 |
-25 |
N+ |
|
2000 |
2000 |
-w+ |
|
-125000 |
-125000 |
Z2f+ |
|
8654321 |
8654321 |
121Ly+ |
|
1/3 |
0.3333333333333333 |
2\|3/ |
Rational |
-13/7 |
-1.8571428571428572 |
p\|7/ |
|
1/0 |
N/A | 2\|/ |
Infinity |
-1/0 |
N/A | 1\|/ |
-Infinity |
0/0 |
N/A | \|/ |
NaN |
20.24 |
20.24 |
_g\|3. |
Decimal |
1e100 |
1e100 |
2\|38. |
|
-1e-200 |
-1e-200 |
1\|6f. |
|
Math.PI |
3.141592653589793 |
mkEokiJF2\|t. |
|
Math.sqrt(3) |
1.7320508075688772 |
1X4t8mn8q8\|v. |
|
true |
true |
! |
True |
false |
false |
~ |
False |
null |
null |
? |
Null |
'' |
"" |
$ |
Empty String |
'Banana' |
"Banana" |
Banana@ |
B64 String |
'Hi, World' |
"Hi, World" |
9$Hi, World |
String |
'🍌' |
"🍌" |
4$🍌 |
UTF-8 String |
[ 1, 2, 3] |
[ 1, 2, 3] |
6;2+4+6+ |
Lists |
[ 100, 100, 100 ] |
[ 100, 100, 100 ] |
6;1**38+ |
Lists with Pointers |
[ 1, 2, 3 ] |
[ 1, 2, 3 ] |
6\|3;2+4+6+ |
Counted Lists |
{ a: 1, b: 2, c: 3 } |
{"a":1,"b":2,"c":3} |
c:a@2+b@4+c@6+ |
Maps |
{ a: 1, b: 2, c: 3 } |
{"a":1,"b":2,"c":3} |
c\|3:a@b@c@2+4+6+ |
Counted Maps |
[ { name: 'Alice' }, { name: 'Bob' } ] |
[{"name":"Alice"},{"name":"Bob"}] |
l\|2;8:8*Alice@9:name@Bob@ |
Repeated Keys |
new Map([[1,2],[3,4]]) |
N/A | 8\|2:2+6+4+8+ |
Non-string Keys |
new Uint8Array([213,231,187]) |
N/A | 4=1ee7 |
Bytes |
Use Rando anywhere you might use JSON if the following are true:
First add rando as a dependency to your project:
npm i -S @creationix/rando
Then import it as any module. The npm package also includes a .d.ts
package for conveinence.
import { stringify, parse } from "@creationix/rando"
const sampleDoc = {
person: {
name: 'John Doe',
age: 30,
id: 12345,
'ai-generated': true,
},
list: [1, 2, 3, 4, 5],
nested: {
key: 'value',
nested: {
key: 'value',
},
},
}
const encoded = stringify(sampleDoc)
console.log(encoded)
// 1B|3:person@list@11*H|4:name@age@id@c$ai-generated8$John DoeY+61O+!a;2+4+6+8+a+n|2:b*nested@6*a:key@value@
const decoded = parse(encoded)
console.log(decoded)
// {
// person: [Getter],
// list: [Getter],
// nested: [Getter],
// }
The string output is generally smaller than the JSON equivalent, especially if you have a lot of repeated values. Some documents in the wild end up being 30% their original size!
The decoded value in JavaScript is lazy parsed so it should be very fast even for large documents as only the top-level keys in an object are eagerly decoded. All object values and array values are lazilly decoded on first access and then cached for future access.
Rando can encode anything that JSON can encode:
It can also encode:
Infinity
, -Infinity
and NaN
)These three primitives have a dedicated symbol each.
true: `!`
false: `~`
null: `?`
Numbers are encoded using three different types to optimize for common values.
integer: zigzag(number) `+`
rational: zigzag(numerator) `|` denominator `/`
decimal: zigzag(base) `|` zigzag(exponent) `.`
Integer uses zigzag encoding so small negative values are short, but we can encode arbitrary precision using any number of digits.
Rational works great for values like 0.33333333333333333
that encode as 1/3
. Also rational allows encoding Infinity
, -Infinity
, and NaN
as 1/0
, -1/0
and 0/0
.
Decimal powers of 10 as typically written in scientific notation. 10000000000
is 1e10
, not powers of 2 like double
or float
.
Rando can encode unicode strings using UTF-8 as well as arbitrary binary data using base64.
Since strings are so common in most data sets a few special types exist to shave the bytes.
b64-string: b64String `@`
utf8-string: len `$` utf-8-data
bytes: len `=` base64-data
chain: len `,` ( b64-string | string | bytes | pointer | reference | chain )*
Lists are zero or more arbitrary values that can be encoded in three levels of verboseness.
list: len `;` value*
counted-list: len `|` count `;` value*
indexed-list: len `|` count `|` width `;` array-index value*
The first has O(n)
costs to find an item and O(n)
costs to count the items.
By adding a count we reduce the cost of counting to O(1)
By also adding the array index we reduce the cost of indexing an item to O(1)
Maps are zero or more pairs of arbitrary values that can also be encoded in three levels of verboseness.
Note that keys are not limited to strings.
map: len `:` (key value)*
counted-map: len `|` count `:` key* value*
indexed-map: len `|` count `|` width `:` trie-index key* value*
The first has O(n)
costs to find a key, and if the value is huge, this may require fetching multiple blocks.
By adding a count, we're able to move the keys up-front where they are more likely to be contained in the same block.
Then by adding an index, we reduce lookup costs to O(log n)
time.
One method rando can use to speed up decoding and reduce serialized size is to de-duplicate seen or known values.
pointer: offset `*`
reference: index `&`
A pointer
points to a value seen
later in the document. Since Rando encodes back to front, these are values that have already been seen
while encoding.
To decode a pointer, add offset
bytes after the *
and decode again.
A reference
is a zero based index into an external array of known
values provided by the caller of both the encoder and decoder.
It's up to the user of the library to ensure that the same list is used for both sides.
Rando is designed on the principle that it might be used for cases where the document is so large that the client doesn't want to download the entire thing before starting to read it.
This is why the various optimizations take into account the blockSize
encoding option.
listCountThreshold
encoding option.mapCountThreshold
encoding option or the encoded keys alone are larger than one block.With this in mind, rando is designed to be used with a system that has a block-level caching proxy.
One common method for hosting these files is a normal HTTP server that supports range requests.