The variable _balancer_ignored_apps is not static, that causes _app_balance_policy and _cluster_balance_policy have separate _balancer_ignored_apps. So, when we set _balancer_ignored_apps, it only takes effect on _app_balance_policy.
Both issues will be fixed in this pr.
What is changed and how does it work?
command
meta.lb.ignored_nodes_list <get|set|clear> [node_addr1,nodes_addr2..]
Supports get, set, and clear commands.
The number of blacklisted nodes must not exceed the number of alive_nodes minus 2, otherwise balancing will not be possible.
app_balance_policy
Move primary
No increase or decrease in node slicing is involved, so no restriction is applied.
copy primary and copy secondary
The balancing strategy is the same for both phases, so the restriction method is the same.
Sort the nodes by the number of primary replica from smallest to largest to get pri_queue
On the pri_queue, id_min always points to the head node of the pri_queue and id_max always points to the tail node of the pri_queue,
as shown below.
+------+------+------+------+------+------+------+------+
| |
V V
id_min id_max
For all replicas on the current id_max, find their corresponding disks and obtain their disk loads, select the disk with the highest load and its corresponding primary replica, and perform the relocation.
+1/-1 for the current number of primaries pointed to by id_min/id_max, respectively. reorder and loop through the above steps until the number of primaries on the id_min node >= N/M, at which point balance is reached
The difference between copy primary and copy secondary is simply that the queue for copy primary is sorted based on the
number of primary slices of the table on each node. Copy secondary is sorted based on the number of all slices of the table
on each node.
Therefore, it is sufficient to exclude the blacklisted nodes when choosing id_min/id_max.
cluster_balance_policy
Move primary:
No increase or decrease in node slicing is involved, so no restriction is applied.
copy primary and copy secondary
The balancing strategy is the same for both phases, so the restriction method is the same.
The difference between copy primary and copy secondary is simply that the number of slices computed for copy primary is the primary slice and the number of slices computed for copy secondary is the slave slice (excluding the primary slice).
The strategy to implement node blacklisting is:
Ignore the blacklisted on nodes when calculating skew. That is, when calculating both app_skew and server_skew, the nodes on the node blacklist are not counted, and it is sufficient for the remaining nodes to reach balance.
When calculating cluster_min_count_nodes, cluster_max_count_nodes, app_min_count_nodes, app_max_count_nodes, ignore the blacklisted nodes.
Similar to app_balance, i.e., ignore nodes when selecting max and min nodes.
Checklist
Tests
Unit test
// 测试app balance
export GTEST_FILTER=meta.app_balancer_nodes_blacklist_test
./run.sh test -m dsn.meta.test
// 测试cluster balance
export GTEST_FILTER=meta.cluster_balancer_nodes_blacklist_test
./run.sh test -m dsn.meta.test
- Manual test (add detailed scripts or steps below)
1. Building unbalanced clusters with onebox.
Use node restart and the command remote_command -t meta-server meta.lb.assign_secondary_black_list $address_list
2. set node blacklist
3. use command set_meta_level lively.
- app_balance_policy
The initial state of the cluster is:
![image](https://github.com/apache/incubator-pegasus/assets/46274877/4d98354e-6549-4ff6-aa8c-b74127b8edd7)
Set 172.17.0.2:34801, 172.17.0.2:34806 as blacklisted, and then load-balance with a termination state of:
![image (1)](https://github.com/apache/incubator-pegasus/assets/46274877/139712c5-9302-4903-a776-92594b1e4536)
It can be seen that the number of slices for two nodes, 172.17.0.2:34801 and 172.17.0.2:34806, did not change, and the other four nodes reached a balanced state. After clear ignored_node_list, perform balance, the result is:
![image](https://github.com/apache/incubator-pegasus/assets/46274877/e7fee2f7-a68d-441a-90e7-67d44081559e)
- cluster_balance_policy
The initial state of the cluster is:
![image (1)](https://github.com/apache/incubator-pegasus/assets/46274877/6587456e-19b5-4002-a165-db71e1cb8812)
Set 172.17.0.2:34801, 172.17.0.2:34806 as blacklisted, and then load-balance with a termination state of:
![image](https://github.com/apache/incubator-pegasus/assets/46274877/46382308-fd90-4f44-838d-de21b02e29bf)
It can be seen that the number of slices for two nodes, 172.17.0.2:34801 and 172.17.0.2:34806, did not change, and the other four nodes reached a balanced state. After clear ignored_node_list, perform balance, the result is:
![image (1)](https://github.com/apache/incubator-pegasus/assets/46274877/a905426e-a87f-416c-bd93-4d72f81daccb)
What problem does this PR solve?
1976
In addition this, solve two issues for cluster_balance_policy.
https://github.com/apache/incubator-pegasus/blob/cd1682d5e2e4668f4263073d5ddb04b8bd7574c4/src/meta/cluster_balance_policy.cpp#L201-L202 std::move(info) is before use the variable info, causes the skew is wrong.
https://github.com/apache/incubator-pegasus/blob/cd1682d5e2e4668f4263073d5ddb04b8bd7574c4/src/meta/load_balance_policy.h#L113 https://github.com/apache/incubator-pegasus/blob/cd1682d5e2e4668f4263073d5ddb04b8bd7574c4/src/meta/greedy_load_balancer.h#L77-L78
The variable _balancer_ignored_apps is not static, that causes _app_balance_policy and _cluster_balance_policy have separate _balancer_ignored_apps. So, when we set _balancer_ignored_apps, it only takes effect on _app_balance_policy.
Both issues will be fixed in this pr.
What is changed and how does it work?
meta.lb.ignored_nodes_list <get|set|clear> [node_addr1,nodes_addr2..]
Supports get, set, and clear commands. The number of blacklisted nodes must not exceed the number of alive_nodes minus 2, otherwise balancing will not be possible.app_balance_policy
The difference between copy primary and copy secondary is simply that the queue for copy primary is sorted based on the number of primary slices of the table on each node. Copy secondary is sorted based on the number of all slices of the table on each node. Therefore, it is sufficient to exclude the blacklisted nodes when choosing id_min/id_max.
cluster_balance_policy
Checklist
Tests
// 测试cluster balance export GTEST_FILTER=meta.cluster_balancer_nodes_blacklist_test ./run.sh test -m dsn.meta.test