facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
23.82k stars 2.11k forks source link

[Question] The Future of the Block API #4138

Open Sewer56 opened 2 months ago

Sewer56 commented 2 months ago

The current ZStandard Manual states the following:

Block level API (deprecated) If the normal API doesn't provide the functionality you need, please open a GitHub issue.

So here I am.

Context

In low latency real time applications such as games, it is necessary to send messages at a high frequency, to a high amount of clients.

Suppose the following scenario:

In this sort of setup, you have the 1 host send packets to 63 clients at an approximate rate of 50/60/100Hz.

A single 'message' sent every tick may look something like this:

A message like this will need to be sent for each 63 players, resulting in 6300 messages per second at a 100Hz tickrate.

Problem Statement

[!IMPORTANT]
Working at this scale while targeting regular user machines, every bit of effort must be taken to ensure efficiency.

The payload size in particular is especially sensitive. The MTU of a packet is around the 1440-1460 range in practice for most ipv4 endpoints I interact with.

What this means that is the complete packet (IP + UDP + Lib + Payload) exceeds ~1440, it must be split into 2 packets.

When that happens, the host will have to incur:

This overhead can add up quickly. In this scenario, just this 32 byte overhead alone increases the bandwidth requirement by 1.5Mbit/s.

This alone is the difference between (for example):

[!NOTE]

Big companies can afford to run dedicated servers and swallow inefficiencies however...

Open source library/code/mod authors like me, just doing free stuff for the community don't have these sorts of resources. All infrastructure is hosted out of pocket at a loss.

Goals

[!TIP] The goal is to reduce the need to fragment at all costs.

To reduce the need to fragment, the packet size must be minimized as much as possible. That includes manually taking control of block compression/decompression.

The difference between a 1440 and 1445 byte payload (min frame header + block header) can mean the difference between sending 1 and 2 packets. It is crucial I save every byte possible.

The following ideas come to mind (focusing on zstd integration here).

Using Custom Block Ranges and Dictionaries

[!NOTE]
This requires some context, so apologies in advance.

Payload size can be reduced by taking manual control of block creation process. By this I mean controlling which slice of data goes in which block.

Different sections of data can have different distributions of bytes; which in turn affects the efficiency of huffman coding (and friends).


For this example, suppose we have an array of the following structure:

f32 x;
f32 y;
f32 z;

It is possible to maximize compression ratio by first reordering the data into structure of arrays:

x0, x1, y0, y1, z0, z1

And then reordering the bytes to group the each corresponding byte together:

(x0[0], x1[0]), (x0[1], x1[1]), (x0[2], x1[2]), (x0[3], x1[3]), (y0[0], y1[0]), (y0[1], y1[1]) ...

This increases the likelihood of repeated data, as many floats will have the same matching first and/or second byte. And lastly, doing a bit of delta encoding (subtract current byte from previous byte):

i.e.

(44 44) (A5 A5) (F9 0D) (18 D4)

After delta encoding:

(00 00) (00 00) ..

This increases number of repeated 0 byte sequences and distribution of 0 bytes for huffman coding.


I would like to be able to isolate the float data inside a single block, because the byte distribution between this, and for example manually bit packed data will differ.

Likewise, I would also want to apply a custom dictionary for each manually created block. Since that will have context tuned for the very specific data I am working with.

The dictionary would also have greater relevance, where as bit packed data in previous sliding window might not.

Custom Block Header

Because the MTU is ~1450 and max message length is not anywhere near the max block length, it's possible to shave bytes from the block headers:

typedef struct {
  int Last_Block : 1;
  int Block_Type : 2;
  int Block_Size : 13;
} Block_Header_New;

Here I trimmed the block header by 1 byte, which gives a max block size of 8192 bytes. Greater than MTU and still easily larger than any packet will ever be.

Splitting Block Headers

[!TIP] Even better, it's possible to rearrange the data such that additional bytes can be saved.

Suppose I have a fixed (2) number of blocks. I would to split the block header into 2 locations.

I can store 2 bits in the Network Library Header to indicate if the block is compressed or not. At the start of the payload I can then store 2 of the following:

typedef struct {
  int Block_Size : 12;
} Block_Header_Len;

This saves yet another byte.

[!NOTE]
For efficient packet handling, the compressed payload must be byte aligned.

The concern here is moreso that a lack of a raw/block compression API forces us to use the standard zstd block format.

All in All

I'm not sure how, without a block compression API it would be possible to get the maximum out of ZStandard; especially when the difference in a use case like this between a few bytes here (MTU) can be the barrier between a bad and a very good experience.

[!NOTE]
These are estimate savings.

Given the earlier example with the 2016 byte payload, the ability to compress blocks provides the following potential and likely savings:

Somewhere around 150 bytes per message. Or 7.5Mbps of upload speed, if translated onto earlier example.

Describe the solution you'd like

N/A

zstd manual requested to open an issue if the required functionality is not present.

In this case the 'requirements' are:

The existing APIs make these optimisations possible. On the other hand, I'm not sure if the alternative suggestion of using the 'normal' API covers this.

Main concern lies in fragmenting vs not fragmenting. Doubling possibility of transmission failure is a very unwanted effect.

GregSlazinski commented 2 months ago

+1 https://github.com/facebook/zstd/issues/4141