bluesky-social / jetstream

A simplified JSON event stream for AT Proto
MIT License
217 stars 17 forks source link

Partitioned event subscriptions #26

Open ngerakines opened 2 days ago

ngerakines commented 2 days ago

As a consumer of jetstream, I'd like to request columns of events using a deterministic partition function.

Rationale

The number of events consume by jetstream is large and growing. Allowing consumers to specify partitions and partition values through subscriber options allows consumers to have more control on the rate and flow of information at any given time.

It also has the added benefit of providing sampling functionality that does not require consuming the entire stream. Setting partitions=100 and partition=0 effectively returns 1% of events.

Compatibility

Because this is an opt-in feature, this addition would be backwards compatible to existing clients and consumers without disruption.

Configuration

This is an optional feature that can be added the following ways:

The partitions and partition query string parameter set to a number with some validation.

The partitions and partition attributes on the SubscriberOptionsUpdatePayload payload.

{
    "partitions": 4,
    "partition": 0
}

Runtime

At runtime, when the collective partitions information is given, event emission will be based on the following conditional:

if sub.partitions {
    hasher := fnv.New64a()
    hasher.Write(evtBytes)
    hash := hasher.Sum64()

    if (hash % sub.partitions) != sub.partition {
        return nil;
    }
}

I propose the fnv64a hash because it is fast and "good enough", although city and murmur hash algorithms are great alternatives. This would leverage the standard library https://pkg.go.dev/hash/fnv#New64a.

Other

I wrote https://go.dev/play/p/QeMWmC9VQdH as a small demo of fnv hashing.

DavidBuchanan314 commented 2 days ago

All operations have an associated CID, which is a hash of the relevant record. You could perhaps just slice off the last 64 bits of the CID, for example, and use it in place of FNV.