databendlabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.85k stars 750 forks source link

feat(query): Support use parquet format when spilling #16612

Closed forsaken628 closed 4 weeks ago

forsaken628 commented 1 month ago

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Support use parquet format when spilling, you can switch to arrow ipc via set spilling_file_format = 'arrow'.

Tests

Type of change


This change isโ€‚Reviewable

what-the-diff[bot] commented 1 month ago

PR Summary

forsaken628 commented 1 month ago

Benchmark:

dataset: tpch sf100

settings:

set max_memory_usage = 16*1024*1024*1024;
set window_partition_spilling_memory_ratio = 30;
set window_partition_spilling_to_disk_bytes_limit = 30*1024*1024*1024;

sql

EXPLAIN ANALYZE SELECT
    l_orderkey,
    l_partkey,
    l_quantity,
    l_extendedprice,
    l_shipinstruct,
    l_shipmode,
    ROW_NUMBER() OVER (PARTITION BY l_orderkey ORDER BY l_extendedprice DESC) AS row_num,
    RANK() OVER (PARTITION BY l_orderkey ORDER BY l_extendedprice DESC) AS rank_num
FROM
    lineitem ignore_result;
set spilling_use_parquet = 0; 

        โ”œโ”€โ”€ estimated rows: 600037902.00
        โ”œโ”€โ”€ cpu time: 651.285131424s
        โ”œโ”€โ”€ wait time: 168.630024827s
        โ”œโ”€โ”€ output rows: 600.04 million
        โ”œโ”€โ”€ output bytes: 44.87 GiB

        โ”œโ”€โ”€ numbers local spilled by write: 208
        โ”œโ”€โ”€ bytes local spilled by write: 15.06 GiB
        โ”œโ”€โ”€ local spilled time by write: 136.856s

        โ”œโ”€โ”€ numbers local spilled by read: 3072
        โ”œโ”€โ”€ bytes local spilled by read: 15.06 GiB
        โ”œโ”€โ”€ local spilled time by read: 31.933s
set spilling_use_parquet = 1; 

        โ”œโ”€โ”€ estimated rows: 600037902.00
        โ”œโ”€โ”€ cpu time: 848.406496078s
        โ”œโ”€โ”€ wait time: 73.858260885s
        โ”œโ”€โ”€ output rows: 600.04 million
        โ”œโ”€โ”€ output bytes: 44.87 GiB

        โ”œโ”€โ”€ numbers local spilled by write: 208
        โ”œโ”€โ”€ bytes local spilled by write: 9.56 GiB
        โ”œโ”€โ”€ local spilled time by write: 55.665s

        โ”œโ”€โ”€ numbers local spilled by read: 3072
        โ”œโ”€โ”€ bytes local spilled by read: 9.56 GiB
        โ”œโ”€โ”€ local spilled time by read: 17.512s

Compared with arrow ipc, the optimization of parquet's file size mainly comes from dictionary encoding. parquet's cpu usage is quite high at the same time. There is no significant advantage for highly discrete data.

github-actions[bot] commented 1 month ago

Docker Image for PR

note: this image tag is only available for internal use, please check the internal doc for more details.

sundy-li commented 4 weeks ago

LGTM, need rebase.