Support configurable segment balancer strategy

wirybeaver commented 1 year ago

Currently pinot doesn't consider disk usage ratio when distribute segments and cause high skewed issue.

Refer to Druid, which can support configurable segment balancer.

https://druid.apache.org/docs/latest/configuration/index.html

druid.coordinator.balancer.strategy

Jackie-Jiang commented 1 year ago

Good feature request! We should be able to make both InstanceConstraintApplier (choosing server candidates for a given table) and SegmentAssignmentStrategy (how to assign segments to servers) pluggable. @wirybeaver Do you want to help contribute this?

wirybeaver commented 1 year ago

Yep, I am willing to make a contribution and learn Helix stuff along the way. Will have a glimpse this week and write a proposal. Basically, we need to propagate the local disk usage to Instance metadata

Jackie-Jiang commented 1 year ago

cc @chenboat who is also interested in this

wirybeaver commented 1 year ago

I notice there's a commit to support tenant rebalancing. https://github.com/apache/pinot/commit/d6b1b4feba1f9a2168f5119e1b673d7cfcf8d146

To be generic, Pinot does't perfectly support adding new server nodes to cater with organic data growth. When a new node join the tenants, which might be triggered manually or the horizontal scaling based on the signal of running out of disk space, the data rebalancing should also be auto triggered when the pinot-controller observe the new node is alive for a while (e.g. 30 mins). With this feature, the disk shortage issue can be mitigated regardless of the underlying infra (ec2, k8s) taht holds pinots. Otherwise, each company's DevOps team need to implement a post processing workflow to trigger the tenant rebalancing after spinning up a new node.

In addition, pool based segment assignment strategy requires to trigger an api in order to assign new segments onto new nodes, which is even worse than the balanced assignment strategy.

The broker auto rebalancing needs to be considered too. https://github.com/apache/pinot/issues/10181

Wanted to collect all related issues to raise awareness on making pinot Cloud Native, i.e. smoothly support capacity change.

Jackie-Jiang commented 1 year ago

To be generic, Pinot does't perfectly support adding new server nodes to cater with organic data growth. When a new node join the tenants, which might be triggered manually or the horizontal scaling based on the signal of running out of disk space, the data rebalancing should also be auto triggered when the pinot-controller observe the new node is alive for a while (e.g. 30 mins). With this feature, the disk shortage issue can be mitigated regardless of the underlying infra (ec2, k8s) taht holds pinots. Otherwise, each company's DevOps team need to implement a post processing workflow to trigger the tenant rebalancing after spinning up a new node.

We can add a controller periodic task to automatically rebalance all tables (rebalance is idempotent, so it is no-op if no instance is changed), but that need to be disabled by default. Automatically rebalance works for non performance critical use cases, but for performance sensitive use cases (e.g. user facing high throughput use case), rebalance will reduce the cluster capacity, so should be performed within certain maintenance window and triggered manually.

In addition, pool based segment assignment strategy requires to trigger an api in order to assign new segments onto new nodes, which is even worse than the balanced assignment strategy.

This is intentional, and maybe we should do the same even for balanced strategy. For use cases that requires all segments for a partition to be colocated, assigning new segments to new nodes will break it. To solve this, we should use the approach above by automatically rebalancing it.

The broker auto rebalancing needs to be considered too. #10181

Can we close #10181 and open a new ticket for it? The title is very confusing because instance should not be removed from IS when disabled

wirybeaver commented 1 year ago

Thanks Jackie, I retitled the #10181.

apache / pinot

Support configurable segment balancer strategy #11278