canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.4k stars 766 forks source link

Access dqlite #4243

Open xl204431 opened 11 months ago

xl204431 commented 11 months ago

Background

a pod can be get under namespace, and the status stick as "Terminating"

but when get by name, describe ordelete this pod , it shows that the pod can not be found. Error from server (NotFound): pods "XXX" not found

Support need

I guess the data is corrupted and want to investigate dqlite's data. How can I access dqlite?

neoaggelos commented 11 months ago

Hi @xl204431

To get a raw sqlite shell you can use the following command:

/snap/microk8s/current/bin/dqlite \
  --cert /var/snap/microk8s/current/var/kubernetes/backend/cluster.crt \
  --key /var/snap/microk8s/current/var/kubernetes/backend/cluster.key \
  --servers file:////var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml \
  k8s

To retrieve a snapshot of the database (current status of all keys/values), you might the following command to be useful:

microk8s dbctl --debug backup

This will print all current database keys and create a tarball that you can inspect. Hope this helps

neoaggelos commented 11 months ago

It would also help if you share some details about what is the issue that are you experiencing. Thanks again!

xl204431 commented 11 months ago

@neoaggelos Thank you so much for your response!! I think I might need more help!

Issue

I can view the pod when get namespace

root@somehost# microk8s.kubectl get pod -n some-namespace
NAME                        READY   STATUS    RESTARTS       AGE
problem-pod              1/1     Terminating   0              447d
normal-pod                1/1     Running       0              xxd
...
normal-pod                 1/1     Running       0              xxd

But I can not get, describe or delete this problem pod

root@somehost# microk8s.kubectl get pod problem-pod -n some-namespace
Error from server (NotFound): pods "problem-pod" not found
root@somehost# microk8s.kubectl describe pod problem-pod -n some-namespace
Error from server (NotFound): pods "problem-pod" not found
root@somehost# microk8s.kubectl delete pod problem-pod -n some-namespace
Error from server (NotFound): pods "problem-pod" not found

So, this problem-pod keeps its data and affect velero node-agent

Reproduce

Have no idea how it happened

Try to debug

I use microk8s dbctl --debug backup, and got some data, but some data can be displayed in text format, others are binary format. Why? How can I inspect this data?

I tried

/snap/microk8s/current/bin/dqlite \
  --cert /var/snap/microk8s/current/var/kubernetes/backend/cluster.crt \
  --key /var/snap/microk8s/current/var/kubernetes/backend/cluster.key \
  --servers file:////var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml \
  k8s

It connected, but I want to inspect what tables are in the db for further check, however, the command turns out to be error

dqlite> .tables
Error:  exec: near ".": syntax error

I try to learn k8s-dqlite's code, and only find migrator without any other tools to use.

I don't know how to solve this issue and need your help.

neoaggelos commented 11 months ago

Hi @xl204431, sorry for missing this

One way to observe the current state of the datastore is to use:

microk8s dbctl --debug backup

Example output from an empty cluster:

``` Backing up the datastore INFO[0000] mode: backup-dqlite, endpoint: unix:///var/snap/microk8s/x1/var/kubernetes/backend/kine.sock:12379, dir: /tmp/tmpgon0wlxj/backup-2023-10-16-17-35-11 DEBU[0000] 0) /registry/health DEBU[0000] 1) /registry/ranges/servicenodeports DEBU[0000] 2) /registry/apiregistration.k8s.io/apiservices/v1.apps DEBU[0000] 3) /registry/apiregistration.k8s.io/apiservices/v1.authentication.k8s.io DEBU[0000] 4) /registry/apiregistration.k8s.io/apiservices/v1.apiextensions.k8s.io DEBU[0000] 5) /registry/apiregistration.k8s.io/apiservices/v1. DEBU[0000] 6) /registry/apiregistration.k8s.io/apiservices/v1.admissionregistration.k8s.io DEBU[0000] 7) /registry/apiregistration.k8s.io/apiservices/v1.authorization.k8s.io DEBU[0000] 8) /registry/namespaces/kube-system DEBU[0000] 9) /registry/apiregistration.k8s.io/apiservices/v1.autoscaling DEBU[0000] 10) /registry/apiregistration.k8s.io/apiservices/v2.autoscaling DEBU[0000] 11) /registry/apiregistration.k8s.io/apiservices/v1.batch DEBU[0000] 12) /registry/apiregistration.k8s.io/apiservices/v1.certificates.k8s.io DEBU[0000] 13) /registry/apiregistration.k8s.io/apiservices/v1.coordination.k8s.io DEBU[0000] 14) /registry/namespaces/kube-public DEBU[0000] 15) /registry/apiregistration.k8s.io/apiservices/v1.discovery.k8s.io DEBU[0000] 16) /registry/configmaps/kube-system/extension-apiserver-authentication DEBU[0000] 17) /registry/apiregistration.k8s.io/apiservices/v1.events.k8s.io DEBU[0000] 18) /registry/configmaps/kube-system/kube-apiserver-legacy-service-account-token-tracking DEBU[0000] 19) /registry/prioritylevelconfigurations/system DEBU[0000] 20) /registry/apiregistration.k8s.io/apiservices/v1beta2.flowcontrol.apiserver.k8s.io DEBU[0000] 21) /registry/apiregistration.k8s.io/apiservices/v1beta3.flowcontrol.apiserver.k8s.io DEBU[0000] 22) /registry/apiregistration.k8s.io/apiservices/v1.networking.k8s.io DEBU[0000] 23) /registry/namespaces/kube-node-lease DEBU[0000] 24) /registry/apiregistration.k8s.io/apiservices/v1.node.k8s.io DEBU[0000] 25) /registry/apiregistration.k8s.io/apiservices/v1.policy DEBU[0000] 26) /registry/prioritylevelconfigurations/node-high DEBU[0000] 27) /registry/apiregistration.k8s.io/apiservices/v1.rbac.authorization.k8s.io DEBU[0000] 28) /registry/apiregistration.k8s.io/apiservices/v1.scheduling.k8s.io DEBU[0000] 29) /registry/apiregistration.k8s.io/apiservices/v1.storage.k8s.io DEBU[0000] 30) /registry/namespaces/default DEBU[0000] 31) /registry/prioritylevelconfigurations/leader-election DEBU[0000] 32) /registry/prioritylevelconfigurations/workload-high DEBU[0000] 33) /registry/prioritylevelconfigurations/workload-low DEBU[0000] 34) /registry/prioritylevelconfigurations/global-default DEBU[0000] 35) /registry/flowschemas/system-nodes DEBU[0000] 36) /registry/flowschemas/system-node-high DEBU[0000] 37) /registry/flowschemas/system-leader-election DEBU[0000] 38) /registry/flowschemas/workload-leader-election DEBU[0000] 39) /registry/flowschemas/endpoint-controller DEBU[0000] 40) /registry/flowschemas/kube-controller-manager DEBU[0000] 41) /registry/flowschemas/kube-scheduler DEBU[0000] 42) /registry/prioritylevelconfigurations/catch-all DEBU[0000] 43) /registry/flowschemas/service-accounts DEBU[0000] 44) /registry/prioritylevelconfigurations/exempt DEBU[0000] 45) /registry/flowschemas/kube-system-service-accounts DEBU[0000] 46) /registry/flowschemas/probes DEBU[0000] 47) /registry/flowschemas/global-default DEBU[0000] 48) /registry/flowschemas/exempt DEBU[0000] 49) /registry/flowschemas/catch-all DEBU[0000] 50) /registry/priorityclasses/system-node-critical DEBU[0000] 51) /registry/priorityclasses/system-cluster-critical DEBU[0000] 52) /registry/services/specs/default/kubernetes DEBU[0000] 53) /registry/services/endpoints/default/kubernetes DEBU[0000] 54) /registry/endpointslices/default/kubernetes DEBU[0000] 55) /registry/csinodes/test-core-22 DEBU[0000] 56) /registry/serviceaccounts/kube-system/calico-kube-controllers DEBU[0000] 57) /registry/serviceaccounts/kube-system/calico-node DEBU[0000] 58) /registry/configmaps/kube-system/calico-config DEBU[0000] 59) /registry/apiregistration.k8s.io/apiservices/v1.crd.projectcalico.org DEBU[0000] 60) /registry/apiextensions.k8s.io/customresourcedefinitions/bgpconfigurations.crd.projectcalico.org DEBU[0000] 61) /registry/apiextensions.k8s.io/customresourcedefinitions/bgppeers.crd.projectcalico.org DEBU[0000] 62) /registry/apiextensions.k8s.io/customresourcedefinitions/blockaffinities.crd.projectcalico.org DEBU[0000] 63) /registry/apiextensions.k8s.io/customresourcedefinitions/caliconodestatuses.crd.projectcalico.org DEBU[0000] 64) /registry/apiextensions.k8s.io/customresourcedefinitions/clusterinformations.crd.projectcalico.org DEBU[0000] 65) /registry/apiextensions.k8s.io/customresourcedefinitions/felixconfigurations.crd.projectcalico.org DEBU[0000] 66) /registry/apiextensions.k8s.io/customresourcedefinitions/globalnetworkpolicies.crd.projectcalico.org DEBU[0000] 67) /registry/apiextensions.k8s.io/customresourcedefinitions/globalnetworksets.crd.projectcalico.org DEBU[0000] 68) /registry/apiextensions.k8s.io/customresourcedefinitions/hostendpoints.crd.projectcalico.org DEBU[0000] 69) /registry/apiextensions.k8s.io/customresourcedefinitions/ipamblocks.crd.projectcalico.org DEBU[0000] 70) /registry/apiextensions.k8s.io/customresourcedefinitions/ipamconfigs.crd.projectcalico.org DEBU[0000] 71) /registry/apiextensions.k8s.io/customresourcedefinitions/ipamhandles.crd.projectcalico.org DEBU[0000] 72) /registry/apiextensions.k8s.io/customresourcedefinitions/ippools.crd.projectcalico.org DEBU[0000] 73) /registry/apiextensions.k8s.io/customresourcedefinitions/ipreservations.crd.projectcalico.org DEBU[0000] 74) /registry/apiextensions.k8s.io/customresourcedefinitions/kubecontrollersconfigurations.crd.projectcalico.org DEBU[0000] 75) /registry/clusterroles/calico-kube-controllers DEBU[0000] 76) /registry/apiextensions.k8s.io/customresourcedefinitions/networkpolicies.crd.projectcalico.org DEBU[0000] 77) /registry/clusterroles/calico-node DEBU[0000] 78) /registry/apiextensions.k8s.io/customresourcedefinitions/networksets.crd.projectcalico.org DEBU[0000] 79) /registry/clusterrolebindings/calico-kube-controllers DEBU[0000] 80) /registry/clusterrolebindings/calico-node DEBU[0000] 81) /registry/configmaps/kube-public/local-registry-hosting DEBU[0000] 82) /registry/serviceaccounts/kube-system/coredns DEBU[0000] 83) /registry/configmaps/kube-system/coredns DEBU[0000] 84) /registry/ranges/serviceips DEBU[0000] 85) /registry/services/specs/kube-system/kube-dns DEBU[0000] 86) /registry/clusterroles/coredns DEBU[0000] 87) /registry/clusterrolebindings/coredns DEBU[0000] 88) /registry/serviceaccounts/kube-system/endpoint-controller DEBU[0000] 89) /registry/serviceaccounts/kube-system/generic-garbage-collector DEBU[0000] 90) /registry/serviceaccounts/kube-system/clusterrole-aggregation-controller DEBU[0000] 91) /registry/serviceaccounts/kube-system/endpointslicemirroring-controller DEBU[0000] 92) /registry/serviceaccounts/kube-system/replication-controller DEBU[0000] 93) /registry/serviceaccounts/kube-system/statefulset-controller DEBU[0000] 94) /registry/serviceaccounts/kube-system/certificate-controller DEBU[0000] 95) /registry/serviceaccounts/kube-system/endpointslice-controller DEBU[0000] 96) /registry/serviceaccounts/kube-system/job-controller DEBU[0000] 97) /registry/serviceaccounts/kube-system/disruption-controller DEBU[0000] 98) /registry/serviceaccounts/kube-system/cronjob-controller DEBU[0000] 99) /registry/serviceaccounts/kube-system/ttl-controller DEBU[0000] 100) /registry/serviceaccounts/kube-system/ephemeral-volume-controller DEBU[0000] 101) /registry/serviceaccounts/kube-system/node-controller DEBU[0000] 102) /registry/serviceaccounts/kube-system/service-controller DEBU[0000] 103) /registry/serviceaccounts/kube-system/ttl-after-finished-controller DEBU[0000] 104) /registry/serviceaccounts/kube-system/pod-garbage-collector DEBU[0000] 105) /registry/serviceaccounts/kube-system/resourcequota-controller DEBU[0000] 106) /registry/serviceaccounts/kube-system/daemon-set-controller DEBU[0000] 107) /registry/serviceaccounts/kube-system/deployment-controller DEBU[0000] 108) /registry/serviceaccounts/kube-system/replicaset-controller DEBU[0000] 109) /registry/serviceaccounts/kube-system/attachdetach-controller DEBU[0000] 110) /registry/serviceaccounts/kube-system/expand-controller DEBU[0000] 111) /registry/serviceaccounts/kube-system/horizontal-pod-autoscaler DEBU[0000] 112) /registry/serviceaccounts/kube-system/pv-protection-controller DEBU[0000] 113) /registry/serviceaccounts/kube-system/namespace-controller DEBU[0000] 114) /registry/serviceaccounts/kube-system/service-account-controller DEBU[0000] 115) /registry/serviceaccounts/kube-system/persistent-volume-binder DEBU[0000] 116) /registry/serviceaccounts/kube-system/pvc-protection-controller DEBU[0000] 117) /registry/serviceaccounts/kube-system/root-ca-cert-publisher DEBU[0000] 118) /registry/configmaps/default/kube-root-ca.crt DEBU[0000] 119) /registry/configmaps/kube-node-lease/kube-root-ca.crt DEBU[0000] 120) /registry/configmaps/kube-public/kube-root-ca.crt DEBU[0000] 121) /registry/configmaps/kube-system/kube-root-ca.crt DEBU[0000] 122) /registry/serviceaccounts/default/default DEBU[0000] 123) /registry/serviceaccounts/kube-node-lease/default DEBU[0000] 124) /registry/serviceaccounts/kube-public/default DEBU[0000] 125) /registry/serviceaccounts/kube-system/default DEBU[0000] 126) /registry/controllerrevisions/kube-system/calico-node-6fbb45588b DEBU[0000] 127) /registry/crd.projectcalico.org/ippools/default-ipv4-ippool DEBU[0000] 128) /registry/crd.projectcalico.org/clusterinformations/default DEBU[0000] 129) /registry/crd.projectcalico.org/felixconfigurations/default DEBU[0000] 130) /registry/crd.projectcalico.org/ipamconfigs/default DEBU[0000] 131) /registry/crd.projectcalico.org/blockaffinities/test-core-22-10-1-19-0-26 DEBU[0000] 132) /registry/crd.projectcalico.org/ipamhandles/vxlan-tunnel-addr-test-core-22 DEBU[0000] 133) /registry/replicasets/kube-system/coredns-864597b5fd DEBU[0000] 134) /registry/deployments/kube-system/coredns DEBU[0000] 135) /registry/crd.projectcalico.org/kubecontrollersconfigurations/default DEBU[0000] 136) /registry/crd.projectcalico.org/ipamhandles/k8s-pod-network.ec2465275b1b20754b2a53332dd46835c769a24a343e42e40cc606ffb2444ede DEBU[0000] 137) /registry/pods/kube-system/coredns-864597b5fd-q6bcc DEBU[0000] 138) /registry/endpointslices/kube-system/kube-dns-c5z7m DEBU[0000] 139) /registry/controllerrevisions/kube-system/calico-node-fd77f9c5 DEBU[0000] 140) /registry/services/endpoints/kube-system/kube-dns DEBU[0000] 141) /registry/replicasets/kube-system/calico-kube-controllers-77bd7c5b DEBU[0000] 142) /registry/crd.projectcalico.org/ipamhandles/k8s-pod-network.7cdb6ae634d6eb77dddf5ded3c9a729cfa3c4011e5d937d930b0fcf2d2361f5a DEBU[0000] 143) /registry/crd.projectcalico.org/ipamblocks/10-1-19-0-26 DEBU[0000] 144) /registry/pods/kube-system/calico-node-cg6nq DEBU[0000] 145) /registry/daemonsets/kube-system/calico-node DEBU[0000] 146) /registry/pods/kube-system/calico-kube-controllers-66d4dc5c95-8wqt5 DEBU[0000] 147) /registry/replicasets/kube-system/calico-kube-controllers-66d4dc5c95 DEBU[0000] 148) /registry/poddisruptionbudgets/kube-system/calico-kube-controllers DEBU[0000] 149) /registry/deployments/kube-system/calico-kube-controllers DEBU[0000] 150) /registry/minions/test-core-22 DEBU[0000] 151) /registry/leases/kube-node-lease/test-core-22 DEBU[0000] 152) /registry/leases/kube-system/apiserver-j3o5tsgk4d6wtav6cqoosazzvi DEBU[0000] 153) /registry/leases/kube-system/kube-controller-manager DEBU[0000] 154) /registry/masterleases/172.16.101.59 DEBU[0000] 155) /registry/leases/kube-system/kube-scheduler The backup is: backup-2023-10-16-17-35-11.tar.gz ```

The debug output shows the index of each key in the datastore. You can then inspect the tarball to check the $index.key and $index.value files. The first should be the key, the other should be the contents (most likely raw protobuf bytes)

Hope this helps

xl204431 commented 11 months ago

@neoaggelos I tried this. The problem is that I don't know $index.value file's format thus I cannot recognize its data. Are there any tools to inspect this data file?

neoaggelos commented 11 months ago

Hi @xl204431

I imagine this is using the internal protobufs found on the Kubernetes project, so I am not sure if I can help more in this direction. The type definition would be https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L4333

I am not aware of any tooling to directly inspect the data.

stale[bot] commented 5 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.