Open nvanbenschoten opened 2 months ago
@jbowens mentioned that the new sstable format he is working on will not store the synthetic bit. This hasn't been decided but it is possible that we will turn it on by default in 24.3 and perhaps require all sstables to be in that format before upgrading to ~24.1~ 25.1.
On top of that, we can add a NormalizeSuffix
function in the comparer that we apply on suffixes in all range keys that we write out during flushes and compactions (and perhaps on all keys in the WAL?). This would guarantee that when all sstables are in the 24.3 format, all suffixes would be normalized.
Linking this to #96819 as we'll likely want to think through the implications of these migrations on version upgrades.
Because of the previous work in this area, such a migration can be local to a single replica — it doesn't need to be replicated across all replicas in a range. This simplifies things meaningfully.
Recording for posterity, that this is no longer true for range keys (MVCC range tombstones). Due to a bug in the Comparer, MVCC range tombstones with and without the synthetic bit are not logically equivalent. We attempted to correct the Comparer, but that predictably caused a replica divergence: https://github.com/cockroachdb/cockroach/issues/130533 A migration to remove the synthetic bit from MVCC range tombstones will need to be performed above Raft. We can continue with the plan to migrate point keys with the synthetic bit below Raft.
Related internal doc: https://docs.google.com/document/d/1nKgIG-3_7xv7YsgbSmtAfI6-1riokBn05hY1J1x_ikw/edit?usp=sharing
Related incident (replica divergence): https://cockroachlabs.atlassian.net/wiki/spaces/IM/pages/3876585524/2024-09-10+crl-prod-j6q+replica+inconsistency
In https://github.com/cockroachdb/cockroach/pull/116830, we removed synthetic MVCC timestamps. To make this possible, we first landed https://github.com/cockroachdb/cockroach/pull/105523, which stopped encoding and decoding the synthetic timestamp bit in/from MVCC keys. This limited the reach of previously stored synthetic timestamps in long-lived clusters to just the MVCC decoding layer. We paired this with https://github.com/cockroachdb/cockroach/pull/117304, which avoided mixed-version cluster concerns during replica consistency checks (along with a number of other PRs listed in https://github.com/cockroachdb/cockroach/issues/101938).
However, we never ran a migration to strip the synthetic bit from persistent MVCC keys that have the bit set. We should, as keeping around these keys presents some risk. This will allow us to get rid of the key decoding logic which deals with the synthetic MVCC key suffix encoding. It will also allow us to get rid of the MVCC key suffix normalization logic in the key comparator and equality functions, which are needed to ensure that read+decode+encode+delete logic works correctly with synthetic timestamps.
Because of the previous work in this area, such a migration can be local to a single replica — it doesn't need to be replicated across all replicas in a range. This simplifies things meaningfully.
Jira issue: CRDB-41621