cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.06k stars 3.8k forks source link

Root password may be wiped when adding a new 24.1 node to a 23.2 cluster #133519

Open smcvey opened 3 hours ago

smcvey commented 3 hours ago

Describe the problem

Typically on modern clusters, there are entries in the system.migrations table for the major version 0:

major | minor | patch | internal | completed_at
--------+-------+-------+----------+--------------------------------
0 | 0 | 0 | 2 | 2024-10-25 17:20:51.099814+00
0 | 0 | 0 | 4 | 2024-10-25 17:20:51.169182+00

However, there is a certain upgrade path that avoids adding these entries. This means that if a user was to attempt an upgrade from 23.2 to 24.1 by adding new nodes rather than replace existing ones, the node first checks if these entries exist and because they don't, it runs through the initialisation code at https://github.com/cockroachdb/cockroach/blob/v24.1.4/pkg/upgrade/upgrades/permanent_upgrades.go/

upgrade/upgrademanager/manager.go:239 ⋮ [T1,Vsystem,n10] 123 the last permanent upgrade (v0.0-upgrading-step-004) does not appear to have completed; attempting to run all upgrades

One of the things this code does is to reset the root password:

UPSERT INTO system.users (username, "hashedPassword", "isRole", "user_id") VALUES ($1, '', false, 1)

This essentially locks the root user out of the cluster and so needs manually reset by logging in via a root certificate, otherwise root cannot use the DBConsole.

Once this new node has joined and blanked the root password, it also creates the appropriate entries in the system.migrations table so that any further nodes that join in the same manner will not blank the root password again. In other words, it only happens once per cluster - but it does happen. Nodes that are added after the first one will produce this correct log entry indicating it will not perform the initialisation:

I241025 18:32:18.026174 34 upgrade/upgrademanager/manager.go:233 ⋮ [T1,Vsystem,n7] 186  detected the last permanent upgrade (v0.0-upgrading-step-004) to have already completed; no permanent upgrades will run

I'll describe the upgrade path that leads to this condition below, but here is the system.migrations table when my cluster is at 23.2, but before adding any 24.1 nodes:

  major | minor | patch | internal |         completed_at
--------+-------+-------+----------+--------------------------------
     21 |     1 |     0 |     1104 | 2024-10-26 09:30:37.47478+00
     21 |     1 |     0 |     1112 | 2024-10-26 09:30:39.415801+00
     21 |     1 |     0 |     1114 | 2024-10-26 09:30:53.679928+00
     21 |     1 |     0 |     1118 | 2024-10-26 09:30:55.388482+00
     21 |     1 |     0 |     1120 | 2024-10-26 09:30:57.295119+00
     21 |     1 |     0 |     1122 | 2024-10-26 09:30:58.608944+00
     21 |     1 |     0 |     1126 | 2024-10-26 09:31:05.72692+00
     21 |     1 |     0 |     1128 | 2024-10-26 09:31:08.840617+00
     21 |     1 |     0 |     1130 | 2024-10-26 09:31:11.481264+00
     21 |     1 |     0 |     1132 | 2024-10-26 09:31:19.263738+00
     21 |     1 |     0 |     1134 | 2024-10-26 09:31:20.790038+00
     21 |     1 |     0 |     1140 | 2024-10-26 09:31:22.254666+00
     21 |     1 |     0 |     1144 | 2024-10-26 09:31:23.850158+00
     21 |     1 |     0 |     1154 | 2024-10-26 09:31:26.431475+00
     21 |     1 |     0 |     1168 | 2024-10-26 09:31:29.238131+00
     21 |     1 |     0 |     1170 | 2024-10-26 09:31:30.345684+00
     21 |     2 |     0 |        8 | 2024-10-26 09:35:28.177271+00
     21 |     2 |     0 |       12 | 2024-10-26 09:35:30.808243+00
     21 |     2 |     0 |       14 | 2024-10-26 09:35:36.926841+00
     21 |     2 |     0 |       18 | 2024-10-26 09:35:38.360409+00
     21 |     2 |     0 |       22 | 2024-10-26 09:35:48.015332+00
     21 |     2 |     0 |       32 | 2024-10-26 09:35:50.388111+00
     21 |     2 |     0 |       34 | 2024-10-26 09:35:52.214946+00
     21 |     2 |     0 |       36 | 2024-10-26 09:35:52.999877+00
     21 |     2 |     0 |       38 | 2024-10-26 09:35:54.104424+00
     21 |     2 |     0 |       48 | 2024-10-26 09:35:58.209252+00
     21 |     2 |     0 |       52 | 2024-10-26 09:35:59.279859+00
     21 |     2 |     0 |       54 | 2024-10-26 09:36:00.307057+00
     21 |     2 |     0 |       56 | 2024-10-26 09:36:02.763608+00
     21 |     2 |     0 |       58 | 2024-10-26 09:36:03.693706+00
     21 |     2 |     0 |       62 | 2024-10-26 09:36:04.924214+00
     21 |     2 |     0 |       94 | 2024-10-26 09:36:08.984405+00
     21 |     2 |     0 |      108 | 2024-10-26 09:36:11.212783+00
     21 |     2 |     0 |      112 | 2024-10-26 09:36:12.473979+00
     22 |     1 |     0 |        2 | 2024-10-26 09:41:04.994465+00
     22 |     1 |     0 |       14 | 2024-10-26 09:41:15.85167+00
     22 |     1 |     0 |       18 | 2024-10-26 09:41:43.47607+00
     22 |     1 |     0 |       20 | 2024-10-26 09:41:58.502321+00
     22 |     1 |     0 |       24 | 2024-10-26 09:41:59.638883+00
     22 |     1 |     0 |       28 | 2024-10-26 09:42:03.210138+00
     22 |     1 |     0 |       30 | 2024-10-26 09:42:04.365931+00
     22 |     1 |     0 |       32 | 2024-10-26 09:42:07.685789+00
     22 |     1 |     0 |       34 | 2024-10-26 09:42:09.099256+00
     22 |     1 |     0 |       36 | 2024-10-26 09:42:16.656559+00
     22 |     1 |     0 |       38 | 2024-10-26 09:42:18.922945+00
     22 |     1 |     0 |       40 | 2024-10-26 09:42:21.85357+00
     22 |     1 |     0 |       42 | 2024-10-26 09:42:22.815692+00
     22 |     1 |     0 |       50 | 2024-10-26 09:42:31.541007+00
     22 |     1 |     0 |       52 | 2024-10-26 09:42:32.414041+00
     22 |     1 |     0 |       54 | 2024-10-26 09:42:36.45354+00
     22 |     1 |     0 |       58 | 2024-10-26 09:42:37.602578+00
     22 |     1 |     0 |       62 | 2024-10-26 09:42:38.826475+00
     22 |     1 |     0 |       66 | 2024-10-26 09:42:40.323295+00
     22 |     1 |     0 |       76 | 2024-10-26 09:42:42.56536+00
     22 |     2 |     0 |        4 | 2024-10-26 09:49:22.526596+00
     22 |     2 |     0 |        6 | 2024-10-26 09:49:25.626527+00
     22 |     2 |     0 |        8 | 2024-10-26 09:49:31.646343+00
     22 |     2 |     0 |       10 | 2024-10-26 09:49:34.222747+00
     22 |     2 |     0 |       12 | 2024-10-26 09:50:22.791095+00
     22 |     2 |     0 |       14 | 2024-10-26 09:50:34.747934+00
     22 |     2 |     0 |       18 | 2024-10-26 09:50:52.925178+00
     22 |     2 |     0 |       20 | 2024-10-26 09:50:55.566186+00
     22 |     2 |     0 |       22 | 2024-10-26 09:51:12.529223+00
     22 |     2 |     0 |       28 | 2024-10-26 09:51:20.282633+00
     22 |     2 |     0 |       32 | 2024-10-26 09:51:24.478771+00
     22 |     2 |     0 |       36 | 2024-10-26 09:51:32.016424+00
     22 |     2 |     0 |       38 | 2024-10-26 09:51:34.205437+00
     22 |     2 |     0 |       42 | 2024-10-26 09:51:52.101033+00
     22 |     2 |     0 |       44 | 2024-10-26 09:52:01.4381+00
     22 |     2 |     0 |       46 | 2024-10-26 09:52:10.030203+00
     22 |     2 |     0 |       48 | 2024-10-26 09:52:18.808471+00
     22 |     2 |     0 |       50 | 2024-10-26 09:52:21.086315+00
     22 |     2 |     0 |       52 | 2024-10-26 09:52:31.910944+00
     22 |     2 |     0 |       54 | 2024-10-26 09:52:47.619057+00
     22 |     2 |     0 |       56 | 2024-10-26 09:52:59.945725+00
     22 |     2 |     0 |       68 | 2024-10-26 09:53:06.953093+00
     22 |     2 |     0 |       72 | 2024-10-26 09:53:09.831112+00
     22 |     2 |     0 |       74 | 2024-10-26 09:53:14.011642+00
     22 |     2 |     0 |       76 | 2024-10-26 09:53:22.009314+00
     22 |     2 |     0 |       80 | 2024-10-26 09:53:25.897038+00
     22 |     2 |     0 |       84 | 2024-10-26 09:53:36.046004+00
     22 |     2 |     0 |       88 | 2024-10-26 09:53:46.591369+00
     22 |     2 |     0 |       90 | 2024-10-26 09:53:52.464204+00
     22 |     2 |     0 |       92 | 2024-10-26 09:59:21.422615+00
     22 |     2 |     0 |       94 | 2024-10-26 09:59:32.107905+00
     22 |     2 |     0 |       96 | 2024-10-26 09:59:41.603464+00
     22 |     2 |     0 |       98 | 2024-10-26 09:59:48.994357+00
     22 |     2 |     0 |      100 | 2024-10-26 09:59:55.040936+00
     22 |     2 |     0 |      102 | 2024-10-26 10:00:00.273594+00
     23 |     1 |     0 |        2 | 2024-10-26 10:07:23.773478+00
     23 |     1 |     0 |       18 | 2024-10-26 10:07:55.28096+00
     23 |     1 |     0 |       20 | 2024-10-26 10:07:58.443593+00
     23 |     1 |     0 |       30 | 2024-10-26 10:08:02.256102+00
     23 |     1 |     0 |       32 | 2024-10-26 10:08:05.048442+00

To Reproduce

Starting with a version later than 21.1 doesn't seem to cause the same missing entries in the system.migrations table, so it looks like this affects clusters of a certain age.

Note that I cannot consistently cause this missing data in the migrations table. I suspect there is an order of operations involving the startup of nodes which I've not been able to pin down yet.

Expected behavior

The issue doesn't happen when adding a new 23.2 to an existing cluster, because the code here checks a different entry in the system.migrations table:

I241026 13:22:12.933511 32 upgrade/upgrademanager/manager.go:231 ⋮ [T1,Vsystem,n4] 72  detected the last permanent upgrade (v23.1-30) to have already completed; no permanent upgrades will run

Version 24.1 should do something similar rather than rely on the existence of the major version zero entries.

Environment:

Additional context Root password is blanked out so the administrator requires an existing root client certificate to connect. The DBConsole is locked out until the password is set again.

Jira issue: CRDB-43656

blathers-crl[bot] commented 3 hours ago

Hi @smcvey, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.