In a fast restore experiment, we (Evan and I) noticed that a two-member SS team gets 200MB/s (?) input (write) bytes and ~30MB/s output (read) bytes. The cluster uses Redwood storage engine with double replication.
Data distribution cannot finish relocating shards for rebalancing the load, because destination SSs cannot catch up in reading/moving data away from the hammered hot SSs.
We may need to experiment with the write-heavy workload for Redwood and confirm if this can happen. If so, we need to see if the tag-throttling feature can solve this situation.
In a fast restore experiment, we (Evan and I) noticed that a two-member SS team gets 200MB/s (?) input (write) bytes and ~30MB/s output (read) bytes. The cluster uses Redwood storage engine with double replication.
Data distribution cannot finish relocating shards for rebalancing the load, because destination SSs cannot catch up in reading/moving data away from the hammered hot SSs.
We may need to experiment with the write-heavy workload for Redwood and confirm if this can happen. If so, we need to see if the tag-throttling feature can solve this situation.