HackerExperience / Helix

GNU Affero General Public License v3.0
53 stars 10 forks source link

Optimize HELL.PK for spatial and temporal data locality #377

Closed renatomassaro closed 6 years ago

renatomassaro commented 6 years ago

The problem

IDs are hard. Sequential IDs are prone to race condition. Completely random GUIDs are too sparse, losing both spatial and temporal data locality. Furthermore, GUIDs do not have human-readable information to figure out which object is represented by the ID.

Extended description of the problem.

The solution

After several months of background processing while showering / walking / sleeping, I've come up with the following format for our IPv6-based (128-bit) PK:

which maps to something like:

gggg:ggpp:pppp:pptt:tttt:tooo:oooo:oooh

This gives us:

In fact, since the timestamp hash increments every 3 seconds, a collision would have to happen within this 3 second time frame, which is quite unlikely. In ~25.5 years we'll reach fffffff. It's probably a good idea to start worrying about this once we are near ff00000 :).

Some objects have no parent, e.g. Entities. These will have all but the last digit (h) available for random generation. Some objects have no grandparents, e.g. Servers. These will have all 56 bits of inheritance information reserved for the parent.

Practical example

Suppose we have Entities. These Entities have Servers. These Servers have Processes.

Process is the object (o). Server is the parent (p) and Entity is the grandparent (g). The proposed PK format allows us to reach for:

This is a great locality + cache optimization for:

Caveats / cons

So one could say Helix will expire sometime after 2043. Hopefully HE2 will have been released by then :smile:

renatomassaro commented 6 years ago

Note to self: when implementing this, use pg_repack to constantly CLUSTER the index (in a "CONCURRENTLY" fashion) so we get around the MVCC limitations.

Also optimize pg's fillfactor accordingly. Somewhere in between 80 and 90% seems OK for starters.

renatomassaro commented 6 years ago

Implemented at #408 with minor changes to the specs proposed at the issue description. See lib/id/id.ex for a definitive, up-to-date documentation.