databendlabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.88k stars 751 forks source link

feat(query): speed up fetching redis data from dictionaries via mget. #16766

Closed Dragonliu2018 closed 3 hours ago

Dragonliu2018 commented 2 weeks ago

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Previous Impl: For the Column type, loop through the Column and get the values from Redis one by one via OpenDAL.

Current Impl: OpenDAL only supports fetching one value at a time, so we use redis client to fetch the data in batches (via mget).

We also try to add a cache for performance optimization. The LRU cache is used first, and the performance is shown in the table below. However, the performance is not stable. Performance is good if the required data is cached, but poor if it is not or if the cache is full. In addition, we need to handle stale data in the cache to avoid wasting memory. So the TTL cache is tried, but the performance is worse than the LRU cache. So we temporarily give up on using cache for optimization. Related code: https://github.com/databendlabs/databend/pull/16882

strategy \ key count \ elapsed time(seconds) 1w 10w 100w 1kw
Previous Impl (one by one) 7.338 73.229 594.109 386.483
Current Impl (in batches) 0.051 0.145 0.66 5.194
in batches + LRU cache (data not cached) 0.074 0.197 1.049 6.841
in batches + LRU cache (data cached) 0.034 0.079 0.505 3.228

Reproduce:

# Create Redis DICTIONARY
CREATE OR REPLACE DICTIONARY d1 ( key VARCHAR NULL, value VARCHAR NOT NULL) PRIMARY KEY key
SOURCE(redis(host='100.73.238.81' port='6379'));

# Create test table and insert 1kw keys
CREATE OR REPLACE TABLE red_test_10000000 AS
SELECT
    number % 100000000 AS id,
    concat('key', id::string) as key
FROM numbers(10000000);

# test SQL
select key, dict_get(d1, 'value', key) from red_test_10000000;

part of #16550

Tests

Type of change


This change isโ€‚Reviewable

Xuanwo commented 2 days ago

This PR reminds me the possibility of adding mget support in opendal.

b41sh commented 1 day ago

This PR reminds me the possibility of adding mget support in opendal.

@Xuanwo Can OpenDAL consider adding mget? With OpenDAL, we can simplify the code.

Xuanwo commented 1 day ago

@Xuanwo Can OpenDAL consider adding mget? With OpenDAL, we can simplify the code.

Not in the near future. We can revisit this part later :heart: