cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.84k stars 3.77k forks source link

backupccl: revision history backups can record incorrect metadata, which may cause corrupt restores #101963

Closed msbutler closed 1 year ago

msbutler commented 1 year ago

A backup with revision history can record incorrect metadata on user data stored in a backup SST if the bulkio.backup.split_keys_on_timestamps is set to true (default on 22.1 and 22.2). If a restore needs to grab data from this SST, it may not restore the correct data. Update heavy workloads, i.e those which generate more MVCC history for a given key are more vulnerable to this bug.

Technical Details Some background: A backup consists of data stored in SSTables and a backup manifest with metadata. To generate the backup SSTs, dist sql backup processors send ExportRequests to KV, asking kv to return an in memory SST for a given key span over a given time range. Backup processors then buffer these responses, sort them, combine them and flush the combined SSTs to external storage. The backup manifest tracks which key spans are in which backup SSTs on the export response level, not on the physical sst level in cloud storage. So given the export response, [a,b), the manifest will contain an entry which indicates this key span is stored in a given sst (perhaps along with a bunch of other key spans).

When bulkio.backup.split_keys_on_timestamps is set to false, a kv server can only return an in memory SST to the backup dist sql processor if calling endKey.Next() will surface a new key. In other words, a response cannot be split within a given key's MVCC history. When this setting is set to true, the default in 22.1 and beyond, we allow this. Call this in-memory SST "split at mid key". Example: a response has the inclusive key span of [a,b_midkey], so the next response will be [b_midkey-1,...]. Now, while the in-memory sst contains data on key b, the backup manifest will record an entry using the exclusive span [a,b). In other words, according the backup manifest, this response has no data on b, even though it does.

This metadata error can only lead to restore corruption if the physical SST is also split mid key -- i.e. the sst's end key is b_midkey. Recall that the backup processor, buffers, sorts and combines these in memory SSTs in an attempt to write SSTs that approach our target size of 128 MB. We attempt prevent flushing an sst split mid key, but during https://github.com/cockroachdb/cockroach/issues/101118, we discovered certain cases where the split can occur (example).

On the restore side, we select which SSTs we need to read from to restore a given key span by checking if a given backup metadata entry intersects with the required span. To continue our example, if we want to restore the key span [b,z), restore will ignore the entry [a,b), which actually contains data on the inclusive span [a,b_midkey]. However, if this entry was written to the same sst that contains the manifest entry [b_midkey-1,...), then the bug is swallowed. Restore cleanly will read this sst including the key span [a,b_midkey). However, in the rare case where b_midkey is at the end of an sst, restore will not read from that sst, leading to corruption iff the restore AOST matches the mvcc timestamp of a key in this entry.

We've only reproduced this bug using backups with 23.1 kvservers which are much more likely to split at mid key due to elastic cpu limiting. We're still trying to understand how often this can happen on 22.2

Jira issue: CRDB-27193

blathers-crl[bot] commented 1 year ago

Hi @msbutler, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] commented 1 year ago

cc @cockroachdb/disaster-recovery

msbutler commented 1 year ago

huh, even if i set the per row bank payload to be 32 MB, half the threshold to trigger export pagination, I can't repro on 22.2.

dt commented 1 year ago

I've removed 23.1 labels from this since https://github.com/cockroachdb/cockroach/pull/102342 and https://github.com/cockroachdb/cockroach/pull/102343 have merged.

adityamaru commented 1 year ago

Closing as all backports have been merged.