We believe that total RTO will be impacted by the amount of time it takes to run RevertRange on every node in the tenant. To start addressing this, we want to know how long it takes to revert a tenant to a known timestamp and how that time varies based on:
[ ] The number of ranges that the data is split across
[ ] The amount of data is being reverted
We think that a combination of go level benchmarks that allow us to produce easy profiles for consumption by other teams and larger roachtests so that we can test multi-gigabyte reverts are appropriate.
We believe that total RTO will be impacted by the amount of time it takes to run RevertRange on every node in the tenant. To start addressing this, we want to know how long it takes to revert a tenant to a known timestamp and how that time varies based on:
We think that a combination of go level benchmarks that allow us to produce easy profiles for consumption by other teams and larger roachtests so that we can test multi-gigabyte reverts are appropriate.
Jira issue: CRDB-20161
Epic CRDB-18751