databendlabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.71k stars 732 forks source link

feat(query): make external server parallel by batch #16390

Closed sundy-li closed 3 weeks ago

sundy-li commented 3 weeks ago

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

make external server parallel by batch external_server_request_batch_rows

  1. let's create a udf benchmark function 'wait', it will sleep 0.1 second per row.
@udf(input_types=["INT"], result_type="INT")
def wait(x):
    time.sleep(0.1)
    return x
  1. let's select from this udf function
    
    ๐Ÿณ :) select wait(number) from numbers(30) ignore_result;
    0 row read in 3.059 sec. Processed 30 row, 240B (9.81 rows/s, 78B/s)

๐Ÿณ :) explain pipeline select wait(number) from numbers(30) ignore_result; -[ EXPLAIN ]----------------------------------- EmptySink ร— 1 UdfTransform ร— 1 CompoundBlockOperator(Map) ร— 1 NumbersSourceTransform ร— 1


The source is only one block, so it will execute the data row by row in single thread in 3 secs

3. with this pr, we will split the block by `external_server_request_batch_rows` and send the batch on parallel.

๐Ÿณ :) set external_server_request_batch_rows = 5; 0 row read in 0.135 sec. Processed 0 row, 0B (0 row/s, 0B/s)

๐Ÿณ :) select wait(number) from numbers(30) ignore_result; 0 row read in 0.541 sec. Processed 30 row, 240B (55.47 rows/s, 443B/s)



now the query will finish in 0.5 secs

## Tests

- [ ] Unit Test
- [x] Logic Test
- [ ] Benchmark Test
- [ ] No Test - _Explain why_

## Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [x] Performance Improvement
- [ ] Other (please describe):

<!-- Reviewable:start -->
- - -
This change isโ€‚[<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/datafuselabs/databend/16390)
<!-- Reviewable:end -->