apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.66k stars 3.27k forks source link

[Enhancement] The statistics result in high I/O load #29324

Open xingyingone opened 10 months ago

xingyingone commented 10 months ago

Search before asking

Description

version : doris-2.0.3-rc04-87c4d1c

  1. In the production environment, statistics updates periodically cause I/O spikes, affecting data import on stream load
  2. after I set enable_auto_analyze = false, io performance was smooth, and data import was normal e7532db47e1df2bc64539c22c7a05d3

Solution

maybe we can limit the resources used by analyze, and io performance maybe more smooth

Are you willing to submit PR?

Code of Conduct

wangbo commented 10 months ago

From the perspective of resource management, we will support limit scan io for Doris internal query.

xingyingone commented 10 months ago

From the perspective of resource management, we will support limit scan io for Doris internal query.

I want to contribute the pr

wangbo commented 10 months ago

From the perspective of resource management, we will support limit scan io for Doris internal query.

I want to contribute the pr

I plan to use a very simple way which just control scan thread num to limit scan IO; There is more question need to be answered, such as whether Doris need more mechanisms(cgroup or other) to limit scan io; Why do many mature databases not mention restrictions on how to limit scan IO. This needs more research. if you are interested in Doris's workload management ,you can find Doris Pr with label workload-group ,or you can start from Config.enable_workload_group or WorkloadGroupMgr.java.