crichards17 / Distributed_Id_Allocator

A Rust rewrite of the ID Compressor implementation for the Microsoft Fluid Framework
0 stars 0 forks source link
compression distributed-systems rust uuid

Distributed ID Allocator

A library which generates small number representations of arbitrary non-colliding Version 4 UUIDs (stable IDs) across multiple sessions in a network. This scheme enables a distributed application to utilize the global uniqueness guarantees of UUIDs while maintaining the performance advantages of small integers.

Overview

The distributed ID allocation scheme allows clients to use, store, and reference small numbers that map to UUIDs. The primary benefits are improved memory efficiency and reduced storage size, as well as improved runtime performance when used as keys in collections or other data structures in place of UUIDs.

The scheme requires a total order for ID allocation, so the allocator is designed to work within a service that provides centralized total order broadcast.

Sessions

A session can be thought of as a unique identifier for a single allocator. Even if an allocator is taken offline, serialized to disk, and rehydrated back, it's considered the same session if it has the same session ID.

Document

The set of IDs generated by all Sessions contributing to the same distributed collection (e.g. an individual collaborative document). Uniqueness of allocated IDs is limited to the document context, and IDs are not portable between documents.

Generation API

New IDs are generated via synchronous API on the allocator. The API returns a small number representation of the UUID with the following guarantees:

ID Spaces

The allocation scheme separates the small-number IDs into two "spaces":

  1. Session space: IDs are unique within the session scope, but are not guaranteed to be document unique without the accompanying context of their session ID.
  2. Op space: IDs are in their most final form, as unique as possible within the entire system. IDs that have been ordered by the central service ("finalized") are represented in their final form, while any other IDs that have not yet been finalized are left in their session space form.

Each of these spaces is represented by a discrete type. This allows for the direct encoding of an ID's uniqueness guarantees without contextual ambiguity. An ID can be "normalized" from one space to the other with sychronous call.

Usage Guidelines

Allocated IDs can be delivered to application authors for use in place of V4 UUIDs, with the following considerations:

Efficiency Properties

UUID Generation

The allocator generates UUIDs in non-random ways to reduce entropy, optimizing storage of data. A session begins with a random UUID, and subsequent IDs are allocated sequentially. UUIDs generated with this strategy are less likely to collide than fully random UUIDs, and this fact can be leveraged to compact the on-disk and in-memory representations of the allocator.

UUID generation is O(1).

Clustering

The sequential allocation approach allows the system to implement a clustering scheme.

As allocators across the network create IDs, they reserve blocks of positive ID space (clusters) for that allocator. The positive integer space is sharded across different clients, and session space IDs are mapped into these clusters.

Clusters are efficient to store, requiring only two integers: the base positive integer in the cluster and the count of reserved IDs in that cluster.

Normalization

Normalization is O(logn) in the number of IDs created by the originating session, as it requires a simple binary search on the clusters owned by that session.

Architecture

This library consists of three components:

  1. A Rust crate (distributed-id-allocator) containing the core allocator logic and data structures, as well as the full API surface
  2. A Rust crate (wasm-id-allocator) which adapts the core allocator API into a form optimized for interop between JavaScript and WebAssembly (via WASM Bindgen)
  3. A TypeScript package (typescript-id-allocator) which adapts the WASM API into an idiomatic and type-safe TypeScript API, with key interop performance optimizations

This version of the allocator is a rewrite of a prior version in the Microsoft Fluid Framework, which was implemented fully in TypeScript. This effort aimed to move the core allocator from TypeScript to Rust for increased low-level control and leveraging of zero cost abstraction.

Performance

Performance comparison of this allocator and the preexisting TypeScript allocator shows appreciable improvement across most operation types (including all hotpath operations). This is achieved even with the overhead of the TypeScript <-> WASM interop boundary.

Rust codebase integration

The decision to separate the core allocator as a standalone crate, separate from the WASM interop logic, was made to enable idiomatic interface with the allocator API from both JavaScript and Rust.

As a secondary benefit this also enables the core crate to be published separately for use in Rust-only codebases without the need for the wasm and typescript packages as overhead.

Building the repo

Rust

To build the Rust workspace, run cargo build from the rust-wasm-id-allocator folder.

Coverage

We use Tarpaulin to measure code coverage. The following are the coverage results as of the time of this writing:

91.52% coverage, 788/861 lines covered

TypeScript

In order to build, you must have wasm-snip installed (cargo install wasm-snip).