Changing resource partition count via Helix Rest does not work reliably

wmorgan6796 commented 3 months ago

Describe the bug

When updating the resource configuration and ideal state via the rest API, specifically for the number of partitions for a resource, I find that it does not reliably create and assign the new partitions, requiring a full re-creation of the resource (with the same name as the outgoing resource). In addition, when doing that, I've found that when scaling down the number of partitions in a resource by recreating the resource, the new resource will correctly show that it has the scaled down number of partitions, but Helix will still attempt to assign the original larger number of partitions even though the resource was completely recreated.

Helix Configuration is attached for the cluster and resource

To Reproduce

Create a helix cluster and create a resource within it. Create at least 3 instances within the cluster. Let the cluster assign everything out.
Delete the resource
Recreate the same resource with the same name, configuration, etc. except with less partitions (in our case we went from 2520 --> 2048)
See that the participants are still trying to deal with moving non-existent partitions to the Master/Slave states

Expected behavior

When I edit the resource configuration it should automatically handle removing the partitions from participants and remove them entirely from the cluster. Also if I recreate a resource thats exactly the same as one I just deleted, just with a smaller number of partitions, the cluster should correctly assign the right number of partitions, not the older, incorrect number

Additional context

Configuration: Helix-Config.txt

junkaixue commented 3 months ago

@wmorgan6796 did you use the API create/delete resource?

For partition placement, change ResourceConfig does not work. It should be something in IdealState. You should update Ideal with the right partition number.

wmorgan6796 commented 3 months ago

I changed both the ideal state and the resource config

junkaixue commented 2 months ago

@wmorgan6796 Is this cluster in normal state?

Means: 1) has a live controller 2) resource not disabled 3) cluster is not in maintenance mode 4) for WAGED rebalanced, not enough capacity for the cluster. ...

There are multiple cases can lead the partition not change. This requires some understand and debug with your controller log.

junkaixue commented 2 months ago

Any update for this? @wmorgan6796

wmorgan6796 commented 2 months ago

Sorry I’ve been on leave for a bit and haven’t had time to come back to this.

but to answer the question:

Cluster was working Cluster was not disabled, though we had disabled it before and after making the change. Maintenance mode was not on in the cluster There was plenty of capacity in the cluster.

junkaixue commented 1 month ago

Interesting. I cannot reproduce it in my local. One possible situation could be you add/delete the same resource very close and almost same time. But the controller is busy with handling other notification.

Helix use selective update for the metadata:

The delete will trigger a child change for ZK.
The add will trigger another child change.
But if these two operations are close. Then controller will see the child change event and when read data from ZK, the IS already added back. So controller thinks there is no change.

Even your number of partition has been changed, from ZK PoV, it is not data change and will not trigger refresh of data.

In this case, either you can create the resource with different name like add some version to differentiate resource. Or add logic to make sure participants start dropping partitions then create the new ones with same name.

@wmorgan6796

apache / helix