We've been discussing increasing batch_aggregation_shard_count. I think it would be good to split this into two configuration parameters in order to make deploying such changes safer. We could use one count for writes to random shards, and another for broadcast writes to all shards, which advance the overall batch's state machine. (note that reads scan all available shards, so the batch_aggregation_shard_count configuration parameter does not directly impact the read path) So long as all running processes and all existing batches in the database use a count for broadcast writes that is greater than or equal to the count for randomly sharded writes, then our justification for the batch state machine is sound.
Firstly, this would provide a safe avenue for decreasing the shard count, by decreasing the count for random writes, waiting for all the data to turn over, and then decreasing the count for broadcast writes. Secondly, this would make changes to this parameter when running as the helper aggregator safer. Note that, for the leader, it would be possible to sequence an increase in batch_aggregation_shard_count between the different components to obey the above rule, i.e. collection job driver first. For the helper role, however, we do both random sharded writes and broadcast writes from the same process, so a rolling update of the configuration parameter as-is may introduce a small window for strange race conditions.
We've been discussing increasing
batch_aggregation_shard_count
. I think it would be good to split this into two configuration parameters in order to make deploying such changes safer. We could use one count for writes to random shards, and another for broadcast writes to all shards, which advance the overall batch's state machine. (note that reads scan all available shards, so thebatch_aggregation_shard_count
configuration parameter does not directly impact the read path) So long as all running processes and all existing batches in the database use a count for broadcast writes that is greater than or equal to the count for randomly sharded writes, then our justification for the batch state machine is sound.Firstly, this would provide a safe avenue for decreasing the shard count, by decreasing the count for random writes, waiting for all the data to turn over, and then decreasing the count for broadcast writes. Secondly, this would make changes to this parameter when running as the helper aggregator safer. Note that, for the leader, it would be possible to sequence an increase in
batch_aggregation_shard_count
between the different components to obey the above rule, i.e. collection job driver first. For the helper role, however, we do both random sharded writes and broadcast writes from the same process, so a rolling update of the configuration parameter as-is may introduce a small window for strange race conditions.