Open kvaps opened 1 year ago
I fixed this issue by manually editing db, using solution from https://github.com/LINBIT/linstor-server/issues/348#issuecomment-1507646983
I made backup:
kubectl get crds | grep -o ".*.internal.linstor.linbit.com" | xargs kubectl get crds -ojson > crds.json
kubectl get crds | grep -o ".*.internal.linstor.linbit.com" | xargs -i{} sh -xc "kubectl get {} -ojson > {}.json
Collected all devices with weird flags:
cat resources.internal.linstor.linbit.com.json | jq '.items[] | select(.spec.resource_flags!=0 and .spec.resource_flags!=260 and .spec.resource_flags!=388) | "\(.spec.resource_name) \(.spec.node_name) \(.spec.resource_flags)"' -r > list.txt
the output was:
PVC-9B46A955-EDD9-4E08-9C3A-4D2849F9AFE2 KUBE-DEV-HV-1 262144
PVC-9B46A955-EDD9-4E08-9C3A-4D2849F9AFE2 KUBE-DEV-HV-3 264548
Then reseted them to 260
diskless+drbd_diskless
while read res node flags; do cat resources.internal.linstor.linbit.com.json | jq '.items[] | select(.spec.resource_name==$res and .spec.node_name==$node) | .spec.resource_flags=260' --arg res $res --arg node $node -r ; done < list.txt > fix.json
The updated resources:
{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "Resources",
"metadata": {
"creationTimestamp": "2023-04-24T08:58:14Z",
"generation": 3,
"name": "b819a9c8efb44e21117511f96625296df9e633c2147e86b43c022ee60e984164",
"resourceVersion": "138483524",
"uid": "cfc0af48-59e8-4300-a30e-0b84b0667fe3"
},
"spec": {
"create_timestamp": 1682326696079,
"node_name": "KUBE-DEV-HV-1",
"resource_flags": 260,
"resource_name": "PVC-9B46A955-EDD9-4E08-9C3A-4D2849F9AFE2",
"snapshot_name": "",
"uuid": "e2a2b3fe-d2e4-43c2-8ac8-12bfcd2d85f2"
}
}
{
"apiVersion": "internal.linstor.linbit.com/v1-15-0",
"kind": "Resources",
"metadata": {
"creationTimestamp": "2023-04-24T08:58:18Z",
"generation": 7,
"name": "d08533f74aa6c3d02f647feebdef0d3b691ae99385da62ce704425a14fd376e6",
"resourceVersion": "138574025",
"uid": "2febb78b-24e6-410e-84c6-db9eedfa92c7"
},
"spec": {
"create_timestamp": 1682326700754,
"node_name": "KUBE-DEV-HV-3",
"resource_flags": 260,
"resource_name": "PVC-9B46A955-EDD9-4E08-9C3A-4D2849F9AFE2",
"snapshot_name": "",
"uuid": "2629b9df-1d12-4299-9e5c-dcafe2d01fdb"
}
}
After applying fix.json and starting linstor-controller they turned into green:
# linstor r l -r pvc-9b46a955-edd9-4e08-9c3a-4d2849f9afe2
+-------------------------------------------------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage | Conns | State | CreatedOn |
|===================================================================================================================|
| pvc-9b46a955-edd9-4e08-9c3a-4d2849f9afe2 | kube-dev-hv-0 | 7015 | InUse | Ok | Diskless | 2023-04-24 09:12:09 |
| pvc-9b46a955-edd9-4e08-9c3a-4d2849f9afe2 | kube-dev-hv-1 | 7015 | Unused | Ok | UpToDate | 2023-04-24 08:58:16 |
| pvc-9b46a955-edd9-4e08-9c3a-4d2849f9afe2 | kube-dev-hv-3 | 7015 | Unused | Ok | Diskless | 2023-04-24 08:58:20 |
+-------------------------------------------------------------------------------------------------------------------+
cc @ghernadi
Good that everything is green again (although direct DB manipulations are usually rather scary...).
However, I'd be interested in the error report 64478C81-22600-000184
to figure out why DRBD was not able to down the resource properly. Anything else suspicious in dmesg
or other logs?
The preferred way of fixing such an issue is obviously without manipulating the database directly. That means to help Linstor to get rid of the resource properly. Since the DRBD_DELETE
flag was set in both resources, I assume something went wrong there.
Of course the fact that one of the resources was "in the middle of a toggle disk" does not help in properly cleaning up such a state. The interesting question is (as always) how we got into such a state.
I have weird device which I can't remove:
Device is missing in LV
The wierd thing is that the device on
kube-dev-hv-0
is inDfltDisklessStorPool
but shown as unintentional diskless:It is really missing in lvs output: