Open colprog opened 2 years ago
Hello, I am Blathers. I am here to help you get the issue triaged.
Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.
I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
cc @cockroachdb/bulk-io
This misbehaviour, unfortunately, is caused by known limitations of the legacy schema changer. Reverting schema change jobs can leave the table descriptors in invalid states, effectively making the table inaccessible. That being said, I was surprised to find out that it's not possible to drop the table. This is most definitely not expected. We're looking into this.
In the long term, we're addressing this limitation by doing a massive overhaul of how schema changes are implemented. This effort is already underway but it will take a while before it bears fruit.
This is loosely related to https://github.com/cockroachdb/cockroach/issues/50651.
To do this we'd need some way to tell the resolution in the drop case to allow some invalid outgoing references. This is hard. For now, we'll say that we need to repair the graph.
In some cases there's a desire to drop a whole database which might be corrupted. In that case, once there are no cross-database references, we can just destroy all the descriptors and data without thinking too hard.
This will depend on cross-database reference removal. CC @postamar
The assignment here is to remember this discussion and the cross database reference work.
Describe the problem
Cluster in a inconsistent state with no clear way to recover, backup/restore commands reports descriptor missing:
tried dropping the database, also fails:
Now we're stuck with malfunctioning cluster, new sql connection would fail since type discovery related queries like SELECT pg_type.oid, enumlabel FROM pg_enum JOIN pg_type ON pg_type.oid=enumtypid; would also fail
To Reproduce
Expected behavior A way to recover from this state. A way to forcefully remove this database seems good enough
Additional data / screenshots If the problem is SQL-related, include a copy of the SQL query and the schema of the supporting tables.
If a node in your cluster encountered a fatal error, supply the contents of the log directories (at minimum of the affected node(s), but preferably all nodes).
Note that log files can contain confidential information. Please continue creating this issue, but contact support@cockroachlabs.com to submit the log files in private.
If applicable, add screenshots to help explain your problem.
Environment:
Additional context What was the impact?
Add any other context about the problem here.
Jira issue: CRDB-12033