cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.03k stars 3.8k forks source link

upgrade: provide a mechanism to run code in a patch version #84457

Closed ajwerner closed 8 months ago

ajwerner commented 2 years ago

Is your feature request related to a problem? Please describe.

Sometimes we find that there are descriptor corruption bugs in old code which can lead to bugs in new code. When we detect these, we can sometimes fix the new code to be robust to the old bug. Sometimes making the new code robust to the old bug is difficult -- or a represents a tradeoff which does not seem worth it (leniency to some robust condition at the risk of allowing new bugs to exist).

Often we can detect and rectify the problem programatically. Ideally we'd be able to ship the detection and repair in a new patch release. Today we lack a mechanism to deploy such logic.

Describe the solution you'd like Some way to automate repairs during patch release upgrades.

Jira issue: CRDB-17668

postamar commented 2 years ago

I had thought about this somewhat, looking forward, but my thoughts hadn't crystallized until now. What we want to do is to be able to backport both corruption diagnostic and repair logic somehow, but we're prevented (correctly!) by the backport policy from running it during normal operation (descriptor reads and writes).

Detection

I'm thinking we could gate backported validation logic somehow to only have it run when querying crdb_internal.invalid_objects. Right now we have three tiers: ValidateSelf, ValidateCrossReferences, and ValidateTxnCommit. We could add a fourth, ValidateBackported, which is only accessed by that virtual table and by doctor.

Repair

Correspondingly, we currently have a couple of tiers of descriptor repair, RunPostDeserializationChanges and RunRestoreChanges. RunRestoreChanges doesn't get called during normal operation, only when restoring backups and during cluster upgrades. I think it'd be fine to hook all backported repair logic into that method somehow.

Let's add another virtual table, crdb_internal.repaired_objects, which acts as a view on system.descriptor except the descriptor column values are piped through RunRestoreChanges. Repair can then be done by the user via a query of the form SELECT crdb_internal.unsafe_upsert_descriptor(id, descriptor, true) FROM crdb_internal.repaired_objects WHERE id IN (SELECT id FROM crdb_internal.invalid_objects);.

Upgrades

This doesn't violate backport policy. It could also make cluster upgrades smoother. Consider this invalid_objects query we recently added as an upgrade precondition. In this scheme it's more likely to fail. But what if we ran the above repair query automatically? This way we're sure to repair all known descriptor corruptions every time the cluster is upgraded. This makes it less tedious to add new validation checks, and means that descriptor migrations as an upgrade step are only necessary in cases where we make backwards-incompatible changes to the descriptor protobuf definitions.

ajwerner commented 2 years ago

So a straw man here is to be able to run a job -- much like the upgrade jobs -- but ensure that it is only executed on a node with the proper patch release to run that job and that it is mutually excluded from any major version upgrade migrations.

Consider wanting to backport some sql query which repairs descriptors. We want that query to run exactly once and it will have to be on some node that knows how to run it.


This issue is not asking for more validation to be backported. It's instead to allow us to backport automated repair logic such that later, when a major version upgrade occurs, no issues are encountered.

dt commented 8 months ago

I think we can do this today by backporting a piece of code inside a conditional on binary version to any one of the existing singleton jobs, which can then decide if it wants to e.g. inspect the versions of other nodes to wait or whatever it wants to do. I don't think there's anything we intend to add to the upgrades or jobs framework for this specifically at this time so I'm closing this as unplanned.