In the R3 planning docs, we are currently using unix-like timestamps (1 integer = 1 second) for events. To further optimize storage and reduce the average bytes/event, we should investigate delta compression techniques for these timestamps.
TODO
Implement delta compression for event timestamps:
Since events have incrementing timestamps, store the positive offset from the last timestamp instead of the full timestamp value.
Consider encoding the deltas using a variable-length prefix:
2-byte deltas with a 1-bit prefix to determine if delta (allows for 18.2 hours between events).
Explore other prefix sizes and schemes, such as a 3-state Huffman-style prefix with 0, 10, and 11, to optimize for common case scenarios.
Evaluate the compression ratio achieved by different encoding schemes.
Monitor the frequency of user actions over time and identify common n-grams (actions performed together).
Use the collected data to determine the optimal prefix size and encoding scheme for timestamp deltas based on typical user behavior.
For example, if most events occur within a 4-minute window, a 2-bit prefix might be more efficient than a 1-bit prefix.
Explore existing libraries that handle delta compression for timestamps and evaluate their suitability for our use case.
The logging experiment will provide valuable insights into user behaviour, allowing me to fine-tune the compression scheme based on real-world usage patterns. This optimization will contribute to achieving our goal of an average of 10 bytes/event post-compression.
In the R3 planning docs, we are currently using unix-like timestamps (1 integer = 1 second) for events. To further optimize storage and reduce the average bytes/event, we should investigate delta compression techniques for these timestamps.
TODO
Implement delta compression for event timestamps:
Collect a short-lived logging experiment to analyze user action patterns:
Explore existing libraries that handle delta compression for timestamps and evaluate their suitability for our use case.
The logging experiment will provide valuable insights into user behaviour, allowing me to fine-tune the compression scheme based on real-world usage patterns. This optimization will contribute to achieving our goal of an average of 10 bytes/event post-compression.