XiaoMi / rdsn

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/rdsn
Other
144 stars 58 forks source link

feat(bulk_load): support disk_level ingesting restriction part1 - add ingestion_context class #1035

Closed hycdong closed 2 years ago

hycdong commented 2 years ago

As https://github.com/apache/incubator-pegasus/issues/886 shows, we plan to provide a disk-level concurrent ingesting count restriction. This pull request adds a new class called ingestion_context, recording the ingesting partitions and every node and disk ingesting count.

This pull request also adds two configurations to implememt the restriction:

For example:

Configuration changes

[meta_server]
+ bulk_load_node_max_ingesting_count = 4
+ bulk_load_node_min_disk_count = 1
foreverneverer commented 2 years ago
  • bulk_load_node_max_ingesting_count - restrict node max ingesting partition count

Actually, if you restrict disk count, bulk_load_node_max_ingesting_count seem to be useless, it's ok that you just control disk load.

I think it's more complex to consider two count, you can consider that just use one count whether more simple?

Smityz commented 2 years ago
  • bulk_load_node_max_ingesting_count - restrict node max ingesting partition count

Actually, if you restrict disk count, bulk_load_node_max_ingesting_count seem to be useless, it's ok that you just control disk load.

I think it's more complex to consider two count, you can consider that just use one count whether more simple?

+1, I think max_ingesting_count_per_disk can meet demands, 2 options to restrict bulkload will increase the burden of understanding.

hycdong commented 2 years ago

@Shuo-Jia @Smityz Thanks for your suggestions~ I meant to define only one configuration called max_disk_ingesting_count, but used the node_count+disk_count, there are two reasons:

  1. In different production environment, node may have different disk count.
    • For example, 3 disks, 7 disks, 12 disks, or only one disks. It is not clear enough to show how many partitions are ingesting in one node at the same time.
    • If we have node_count+disk_count, for example, node_count is 4, disk count is 3, it means for each node, there are at most 4 partitions ingesting in this node, and for each disk, there may have 1 partition or 2 partitions ingesting. It is clear for admin use to adjust ingesting restriction.
  2. Using two configurations, the logic can be implemented more graceful.
    • In current meta server design, the node address of partition will store in zk, but its disk information are reported by node, and only store in meta server memory structure called config_context, meta server even doesn't store the disk count of a replica server. Meta server can only traversal each partition's config_context structure, then calculate how many disks a replica server have.
    • The disk count is only an example, there also have similar calculations which will make code complex, I think adding a configuration(disk_count) is a easier and more graceful way.