Closed forsaken628 closed 1 month ago
Is there some benchmark numbers for this PR?
Is there some benchmark numbers for this PR?
@BohuTANG Here just replace the storage, where can I go to find the right environment to do benchmark? If the test environment is not representative, benchmark has no meaning.
Is there some benchmark numbers for this PR?
@BohuTANG Here just replace the storage, where can I go to find the right environment to do benchmark? If the test environment is not representative, benchmark has no meaning.
I think @Dousir9 has some cases to do benchmark, because #16448 also need it.
@forsaken628 Could you provide a benchmark to compare the query execution time with and without local spill ? You can set window_partition_spilling_memory_ratio
to 0
, 30
, 60
for testing.
benchmark:
settings:
set max_memory_usage = 16*1024*1024*1024;
set window_partition_spilling_memory_ratio = 30;
sql
EXPLAIN ANALYZE SELECT
l_orderkey,
l_partkey,
l_quantity,
l_extendedprice,
ROW_NUMBER() OVER (PARTITION BY l_orderkey ORDER BY l_extendedprice DESC) AS row_num,
RANK() OVER (PARTITION BY l_orderkey ORDER BY l_extendedprice DESC) AS rank_num
FROM
lineitem ignore_result;
result
set window_partition_spilling_to_disk_bytes_limit = 0;
โโโ hash keys: [l_orderkey]
โโโ estimated rows: 600037902.00
โโโ cpu time: 474.358152575s
โโโ wait time: 197.201014451s
โโโ output rows: 600.04 million
โโโ output bytes: 26.82 GiB
โโโ numbers remote spilled by write: 112
โโโ bytes remote spilled by write: 26.69 GiB
โโโ remote spilled time by write: 181.29s
โโโ numbers remote spilled by read: 1535
โโโ bytes remote spilled by read: 26.69 GiB
โโโ remote spilled time by read: 102.371s
46 rows explain in 49.291 sec. Processed 0 rows, 0 B (0 row/s, 0 B/s)
set window_partition_spilling_to_disk_bytes_limit = 10*1024*1024*1024;
โโโ hash keys: [l_orderkey]
โโโ estimated rows: 600037902.00
โโโ cpu time: 972.806314767s
โโโ wait time: 277.454963277s
โโโ output rows: 600.04 million
โโโ output bytes: 26.82 GiB
โโโ numbers remote spilled by write: 84
โโโ bytes remote spilled by write: 16.72 GiB
โโโ remote spilled time by write: 681.639s
โโโ numbers remote spilled by read: 1328
โโโ bytes remote spilled by read: 16.72 GiB
โโโ remote spilled time by read: 53.13s
โโโ numbers local spilled by write: 282
โโโ bytes local spilled by write: 10.00 GiB
โโโ local spilled time by write: 78.319s
โโโ numbers local spilled by read: 4160
โโโ bytes local spilled by read: 10.00 GiB
โโโ local spilled time by read: 32.624s
52 rows explain in 102.438 sec. Processed 0 rows, 0 B (0 row/s, 0 B/s)
โโโ hash keys: [l_orderkey]
โโโ estimated rows: 600037902.00
โโโ cpu time: 896.504639969s
โโโ wait time: 279.228803569s
โโโ output rows: 600.04 million
โโโ output bytes: 26.82 GiB
โโโ numbers remote spilled by write: 95
โโโ bytes remote spilled by write: 16.75 GiB
โโโ remote spilled time by write: 535.451s
โโโ numbers remote spilled by read: 1497
โโโ bytes remote spilled by read: 16.75 GiB
โโโ remote spilled time by read: 44.231s
โโโ numbers local spilled by write: 430
โโโ bytes local spilled by write: 10.00 GiB
โโโ local spilled time by write: 64.566s
โโโ numbers local spilled by read: 6315
โโโ bytes local spilled by read: 10.00 GiB
โโโ local spilled time by read: 21.033s
52 rows explain in 91.103 sec. Processed 0 rows, 0 B (0 row/s, 0 B/s)
set window_partition_spilling_to_disk_bytes_limit = 30*1024*1024*1024;
โโโ hash keys: [l_orderkey]
โโโ estimated rows: 600037902.00
โโโ cpu time: 421.353882878s
โโโ wait time: 200.590871919s
โโโ output rows: 600.04 million
โโโ output bytes: 26.82 GiB
โโโ numbers local spilled by write: 1137
โโโ bytes local spilled by write: 26.74 GiB
โโโ local spilled time by write: 142.206s
โโโ numbers local spilled by read: 17646
โโโ bytes local spilled by read: 26.74 GiB
โโโ local spilled time by read: 56.713s
46 rows explain in 48.832 sec. Processed 0 rows, 0 B (0 row/s, 0 B/s)
The results are rather strange, when using only remote spills or just local spills, the time consumption is relatively normal. But when using a mix of them it causes a big jump in cpu time instead of wait time. The local test environment has only one hard disk, so the longer time may be caused by io contention.
@forsaken628 Thanks for the benchmark result, we can try to read and write different IO devices for local spill and remote spill to determine whether it is IO contention.
The big jump in CPU time when using a mix of local and remote spill has disappeared for some reason. It can now only be explained as an environmental issue, so keep watching.
โโโ hash keys: [l_orderkey]
โโโ estimated rows: 600037902.00
โโโ cpu time: 509.153733888s
โโโ wait time: 173.043153894s
โโโ output rows: 600.04 million
โโโ output bytes: 26.82 GiB
โโโ numbers remote spilled by write: 216
โโโ bytes remote spilled by write: 16.75 GiB
โโโ remote spilled time by write: 153.346s
โโโ numbers remote spilled by read: 3440
โโโ bytes remote spilled by read: 16.75 GiB
โโโ remote spilled time by read: 49.122s
โโโ numbers local spilled by write: 622
โโโ bytes local spilled by write: 10.00 GiB
โโโ local spilled time by write: 55.315s
โโโ numbers local spilled by read: 9712
โโโ bytes local spilled by read: 10.00 GiB
โโโ local spilled time by read: 26.595s
pr-16441-e8abf14-1727597928
note: this image tag is only available for internal use, please check the internal doc for more details.
pr-16441-c7090de-1727602679
note: this image tag is only available for internal use, please check the internal doc for more details.
pr-16441-9860569-1727612777
note: this image tag is only available for internal use, please check the internal doc for more details.
pr-16441-4e7f81f-1727698886
note: this image tag is only available for internal use, please check the internal doc for more details.
pr-16441-71c69b3-1727706321
note: this image tag is only available for internal use, please check the internal doc for more details.
pr-16441-a41f29e-1727715020
note: this image tag is only available for internal use, please check the internal doc for more details.
pr-16441-31de655-1728375038
note: this image tag is only available for internal use, please check the internal doc for more details.
pr-16441-80d8a39-1728390166
note: this image tag is only available for internal use, please check the internal doc for more details.
After testing, it was found that the local spill that occurred on the local machine and the cloud had the problem of memory not being released in time, which led to performance degradation.
After testing, it was found that the local spill that occurred on the local machine and the cloud had the problem of memory not being released in time, which led to performance degradation.
If so, this PR should revert from main branch.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
part of #15328
Changes:
How to enable:
set window_partition_spilling_to_disk_bytes_limit = 10 * 1024 * 1024 * 1024
(10G), this is the maximum disk space allowed for each request, when the conditions of spill are met, the local disk will be used in preferenceTests
Type of change
This change isโ