Reloaded-Project / Reloaded-III

[WIP] Formal Specification for the Next Iteration of Reloaded
https://reloaded-project.github.io/Reloaded-III/
9 stars 1 forks source link

[Research] Investigate Delta Compression for Event Timestamps #28

Open Sewer56 opened 3 months ago

Sewer56 commented 3 months ago

In the R3 planning docs, we are currently using unix-like timestamps (1 integer = 1 second) for events. To further optimize storage and reduce the average bytes/event, we should investigate delta compression techniques for these timestamps.

TODO

  1. Implement delta compression for event timestamps:

    • Since events have incrementing timestamps, store the positive offset from the last timestamp instead of the full timestamp value.
    • Consider encoding the deltas using a variable-length prefix:
      • 2-byte deltas with a 1-bit prefix to determine if delta (allows for 18.2 hours between events).
      • Explore other prefix sizes and schemes, such as a 3-state Huffman-style prefix with 0, 10, and 11, to optimize for common case scenarios.
    • Evaluate the compression ratio achieved by different encoding schemes.
  2. Collect a short-lived logging experiment to analyze user action patterns:

    • Monitor the frequency of user actions over time and identify common n-grams (actions performed together).
    • Use the collected data to determine the optimal prefix size and encoding scheme for timestamp deltas based on typical user behavior.
    • For example, if most events occur within a 4-minute window, a 2-bit prefix might be more efficient than a 1-bit prefix.
  3. Explore existing libraries that handle delta compression for timestamps and evaluate their suitability for our use case.

The logging experiment will provide valuable insights into user behaviour, allowing me to fine-tune the compression scheme based on real-world usage patterns. This optimization will contribute to achieving our goal of an average of 10 bytes/event post-compression.