StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.67k stars 1.75k forks source link

Support custom zstd compression levels #46839

Closed derekperkins closed 2 months ago

derekperkins commented 3 months ago

Feature request

Is your feature request related to a problem? Please describe.

I have large data that is very infrequently accessed with no latency requirements, so I want to compress it as much as possible. ZSTD is already an option, but there doesn't appear to be a way to set the compression level.

https://docs.starrocks.io/docs/table_design/data_compression/

Describe the solution you'd like

Similar to Clickhouse, AWS Athena, and other tools, support setting a compression level in parentheses after ZSTD, like ZSTD(22). The supported levels are 1-22. It would also be useful to document what compression level is actually used, as it isn't clear. Clickhouse defaults to 1, Athena defaults to 3. What does StarRocks use?

CREATE TABLE `data_compression` (
  `id`      INT(11)     NOT NULL     COMMENT "",
  `name`    CHAR(200)   NULL         COMMENT ""
)
ENGINE=OLAP 
UNIQUE KEY(`id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`id`)
PROPERTIES (
"compression" = "ZSTD(1-22)"
);
srlch commented 3 months ago

https://github.com/StarRocks/starrocks/pull/46976

srlch commented 3 months ago

In the previous implementation, default level is 3.