canvasxyz / canvas

Programmable runtime for peer-to-peer applications
https://canvas.xyz
78 stars 5 forks source link

GossipLog Id Generation Issue #307

Closed Chandelier-02 closed 3 months ago

Chandelier-02 commented 3 months ago

I have been using the Gossiplog for a while now. However, I noticed that the sync kept failing whenever I had above 16384 entries in the log. It correctly would sync and insert entries for clock values 0-255. But, when it should insert the entry with clock value 256, it would instead find and try to insert the entry with clock value 16384, causing an error to be thrown and the sync to fail.

After debugging for a long time, I discovered it may be an issue with the clock value encoding. I used the id generation code provided, and saw that this is the case.

Clock value 255 gets encoded to '817f' + encoded payload Clock value 256 gets encoded to '8200' + encoded payload Clock value 16383 gets encoded to 'ff17' + encoded payload Clock value 16384 gets encoded to '8180' + encoded payload

Because the keys are stored in sorted order, the entry with clock value 16384 got stored between entries with clock values 255 and 256. This causes the sync to fail every time. This is the problem causing what I was seeing in this issue, https://github.com/canvasxyz/canvas/issues/295, but figured it was worth its own issue now that I understand the problem and know it isn't a sync-specific issue.

Code to reproduce:

// IDs are made by concatenating a **reverse** unsigned varint clock with the hash and then
// truncating to 20 bytes to be base32-friendly, e.g "05vj050kb09l7okead3vvi6so7c7tunn"

import { sha256 } from '@noble/hashes/sha256';
import { base32hex } from 'multiformats/bases/base32';
import * as cbor from '@ipld/dag-cbor';

export const KEY_LENGTH = 20;
export const ID_LENGTH = 32;
export const MIN_MESSAGE_ID = '00000000000000000000000000000000';
export const MAX_MESSAGE_ID = 'vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv';
export const messageIdPattern = /^[0123456789abcdefghijklmnopqrstuv]{32}$/;

const N1 = 1 << 7;
const N2 = 1 << 14;
const N3 = 1 << 21;
const N4 = 1 << 28;
const N5 = 1 << 35;
const N6 = 1 << 42;
const N7 = 1 << 49;
const N8 = 1 << 56;
const N9 = 1 << 63;

export function encodingLength(value: number) {
  if (value < N1) return 1;
  if (value < N2) return 2;
  if (value < N3) return 3;
  if (value < N4) return 4;
  if (value < N5) return 5;
  if (value < N6) return 6;
  if (value < N7) return 7;
  if (value < N8) return 8;
  if (value < N9) return 9;
  return 10;
}

export function encodeClock(key: Uint8Array, clock: number): number {
  if (clock === 0) {
    key[0] = 0;
    return 1;
  }

  const sets: number[] = [];
  while (clock > 0) {
    sets.push(clock & 0x7f);
    clock >>= 7;
  }

  for (let i = 0; i < sets.length; i++) {
    let byte = sets[i];
    if (i > 0) {
      byte |= 0x80;
    }

    key[sets.length - 1 - i] = byte;
  }

  return sets.length;
}

export function decodeClock(key: Uint8Array): [clock: number, byteLength: number] {
  const sets: number[] = [];
  for (const byte of key) {
    sets.push(byte & 0x7f);
    if (byte & 0x80) {
      continue;
    } else {
      break;
    }
  }

  const num = sets.reduce((num, set, i) => {
    const pow = 7 * (sets.length - 1 - i);
    const val = set << pow;
    return num + val;
  }, 0);

  return [num, sets.length];
}

export function getKey(clock: number, value: Uint8Array): Uint8Array {
  const hash = sha256(value);
  const key = new Uint8Array(KEY_LENGTH);
  const encodingLength = encodeClock(key, clock);
  key.set(hash.subarray(0, KEY_LENGTH - encodingLength), encodingLength);
  return key;
}

export const encodeId = (id: string) => base32hex.baseDecode(id);
export const decodeId = (key: Uint8Array) => base32hex.baseEncode(key);

const object = { id: crypto.randomUUID() };
const encodedObject: Uint8Array = cbor.encode(object);

const hex = (uint8Array: Uint8Array) => Buffer.from(uint8Array).toString('hex');

console.log(hex(getKey(255, encodedObject)));
console.log(hex(getKey(256, encodedObject)));
console.log(hex(getKey(16383, encodedObject)));
console.log(hex(getKey(16384, encodedObject)));

Results from ^:
817fc3545f43584a09b7c15a89b82734962e9b7c
8200c3545f43584a09b7c15a89b82734962e9b7c
ff7fc3545f43584a09b7c15a89b82734962e9b7c
818000c3545f43584a09b7c15a89b82734962e9b
joeltg commented 3 months ago

Holy shit, thanks so much for digging into this. You're right, there was a fundamental flaw with our message ID format. We switched it out for one that actually does what we want, using a unary prefix format instead of the MSB-padded format.