decoder-leco / host-storage-based-csi-drivers

A benchmark of multiple csi drivers, which provision Kubernetes persistent volumes, from Kubernetes Cluster nodes storage resources
0 stars 0 forks source link

last error before completing topolvm provisioning #1

Open Jean-Baptiste-Lasselle opened 4 months ago

Jean-Baptiste-Lasselle commented 4 months ago

I disabled cert manager but obvioussly, cert manager is still required, or at least TLS certificates are required to be provided to be mounted onto topolvm pods:

  Type     Reason       Age                 From               Message
  ----     ------       ----                ----               -------
  Normal   Scheduled    111s                default-scheduler  Successfully assigned topolvm-system/topolvm-controller-5dd4b498d9-c2wmx to k8s-cluster-decoderleco-worker5
  Warning  FailedMount  110s                kubelet            MountVolume.SetUp failed for volume "certs" : failed to sync secret cache: timed out waiting for the condition
  Warning  FailedMount  44s (x7 over 110s)  kubelet            MountVolume.SetUp failed for volume "certs" : secret "topolvm-mutatingwebhook" not found
vagrant@debian12:~$ cat ./values.yaml | grep certs
vagrant@debian12:~$ cat ./values.yaml | grep cert
  # webhook.caBundle -- Specify the certificate to be used for AdmissionWebhook.
  caBundle:  # Base64-encoded, PEM-encoded CA certificate that signs the server certificate.
  # webhook.existingCertManagerIssuer -- Specify the cert-manager issuer to be used for AdmissionWebhook.
    # group: cert-manager.io
cert-manager:
  # cert-manager.enabled -- Install cert-manager together.
  ## ref: https://cert-manager.io/docs/installation/kubernetes/#installing-with-helm

The question how does the kind example work (it cannot work with cert manager)

Jean-Baptiste-Lasselle commented 4 months ago

i changed cert manager to enabled=true, then i have a new error (topolvm complains it does not find the lvm binary at /sbin/lvm, so i will map that volume in the kind cluster maybe..):

vagrant@debian12:~$ kubectl -n topolvm-system logs pod/topolvm-lvmd-0-nmd7q
{"level":"info","ts":"2024-06-23T22:28:04Z","msg":"configuration file loaded","device_classes":[{"name":"hdd","volume-group":"vg-decoderleco","default":false,"spare-gb":null,"stripe":null,"stripe-size":"","lvcreate-options":null,"type":"","thin-pool":null}],"socket_name":"/run/topolvm/lvmd.sock","file_name":"/etc/topolvm/lvmd.yaml"}
{"level":"info","ts":"2024-06-23T22:28:04Z","msg":"invoking command","args":["/usr/bin/nsenter","-m","-u","-i","-n","-p","-t","1","/sbin/lvm","fullreport","--reportformat","json","--units","b","--nosuffix","--configreport","vg","-o","vg_name,vg_uuid,vg_size,vg_free","--configreport","lv","-o","lv_uuid,lv_name,lv_full_name,lv_path,lv_size,lv_kernel_major,lv_kernel_minor,origin,origin_size,pool_lv,lv_tags,lv_attr,vg_name,data_percent,metadata_percent,pool_lv","--configreport","pv","-o,","--configreport","pvseg","-o,","--configreport","seg","-o,"]}
{"level":"error","ts":"2024-06-23T22:28:04Z","msg":"failed to run command","error":"exit status 127: nsenter: failed to execute /sbin/lvm: No such file or directory","stacktrace":"github.com/topolvm/topolvm/internal/lvmd/command.getLVMState.func1\n\t/workdir/internal/lvmd/command/lvm_state_json.go:53\ngithub.com/topolvm/topolvm/internal/lvmd/command.getLVMState\n\t/workdir/internal/lvmd/command/lvm_state_json.go:59\ngithub.com/topolvm/topolvm/internal/lvmd/command.ListVolumeGroups\n\t/workdir/internal/lvmd/command/lvm.go:111\ngithub.com/topolvm/topolvm/cmd/lvmd/app.subMain\n\t/workdir/cmd/lvmd/app/root.go:70\ngithub.com/topolvm/topolvm/cmd/lvmd/app.init.func2\n\t/workdir/cmd/lvmd/app/root.go:51\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\ngithub.com/topolvm/topolvm/cmd/lvmd/app.Execute\n\t/workdir/cmd/lvmd/app/root.go:133\nmain.main\n\t/workdir/cmd/hypertopolvm/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}
{"level":"error","ts":"2024-06-23T22:28:04Z","msg":"error while retrieving volume groups","error":"EOF","stacktrace":"github.com/topolvm/topolvm/cmd/lvmd/app.subMain\n\t/workdir/cmd/lvmd/app/root.go:72\ngithub.com/topolvm/topolvm/cmd/lvmd/app.init.func2\n\t/workdir/cmd/lvmd/app/root.go:51\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\ngithub.com/topolvm/topolvm/cmd/lvmd/app.Execute\n\t/workdir/cmd/lvmd/app/root.go:133\nmain.main\n\t/workdir/cmd/hypertopolvm/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}
Error: EOF
Jean-Baptiste-Lasselle commented 4 months ago

i changed cert manager to enabled=true, then i have a new error (topolvm complains it does not find the lvm binary at /sbin/lvm, so i will map that volume in the kind cluster maybe..):

vagrant@debian12:~$ kubectl -n topolvm-system logs pod/topolvm-lvmd-0-nmd7q
{"level":"info","ts":"2024-06-23T22:28:04Z","msg":"configuration file loaded","device_classes":[{"name":"hdd","volume-group":"vg-decoderleco","default":false,"spare-gb":null,"stripe":null,"stripe-size":"","lvcreate-options":null,"type":"","thin-pool":null}],"socket_name":"/run/topolvm/lvmd.sock","file_name":"/etc/topolvm/lvmd.yaml"}
{"level":"info","ts":"2024-06-23T22:28:04Z","msg":"invoking command","args":["/usr/bin/nsenter","-m","-u","-i","-n","-p","-t","1","/sbin/lvm","fullreport","--reportformat","json","--units","b","--nosuffix","--configreport","vg","-o","vg_name,vg_uuid,vg_size,vg_free","--configreport","lv","-o","lv_uuid,lv_name,lv_full_name,lv_path,lv_size,lv_kernel_major,lv_kernel_minor,origin,origin_size,pool_lv,lv_tags,lv_attr,vg_name,data_percent,metadata_percent,pool_lv","--configreport","pv","-o,","--configreport","pvseg","-o,","--configreport","seg","-o,"]}
{"level":"error","ts":"2024-06-23T22:28:04Z","msg":"failed to run command","error":"exit status 127: nsenter: failed to execute /sbin/lvm: No such file or directory","stacktrace":"github.com/topolvm/topolvm/internal/lvmd/command.getLVMState.func1\n\t/workdir/internal/lvmd/command/lvm_state_json.go:53\ngithub.com/topolvm/topolvm/internal/lvmd/command.getLVMState\n\t/workdir/internal/lvmd/command/lvm_state_json.go:59\ngithub.com/topolvm/topolvm/internal/lvmd/command.ListVolumeGroups\n\t/workdir/internal/lvmd/command/lvm.go:111\ngithub.com/topolvm/topolvm/cmd/lvmd/app.subMain\n\t/workdir/cmd/lvmd/app/root.go:70\ngithub.com/topolvm/topolvm/cmd/lvmd/app.init.func2\n\t/workdir/cmd/lvmd/app/root.go:51\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\ngithub.com/topolvm/topolvm/cmd/lvmd/app.Execute\n\t/workdir/cmd/lvmd/app/root.go:133\nmain.main\n\t/workdir/cmd/hypertopolvm/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}
{"level":"error","ts":"2024-06-23T22:28:04Z","msg":"error while retrieving volume groups","error":"EOF","stacktrace":"github.com/topolvm/topolvm/cmd/lvmd/app.subMain\n\t/workdir/cmd/lvmd/app/root.go:72\ngithub.com/topolvm/topolvm/cmd/lvmd/app.init.func2\n\t/workdir/cmd/lvmd/app/root.go:51\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\ngithub.com/topolvm/topolvm/cmd/lvmd/app.Execute\n\t/workdir/cmd/lvmd/app/root.go:133\nmain.main\n\t/workdir/cmd/hypertopolvm/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}
Error: EOF

ok, yeah, first in the kind cluster yaml config there are some special volume mounts, and plus, in the kind example make file, there is, on the host, a specific setup of lvmd configuring a systemV unit, which involves a unix socket which i found mentioned in some eror logs, look:

vagrant@debian12:~$ kubectl -n topolvm-system logs pod/topolvm-node-tx52m -c topolvm-node
{"level":"info","ts":"2024-06-23T22:38:19Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2024-06-23T22:38:19Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-06-23T22:38:19Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Starting EventSource","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","source":"kind source: *v1.LogicalVolume"}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Starting Controller","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume"}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Starting workers","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","worker count":1}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Stopping and waiting for leader election runnables"}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume"}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"All workers finished","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume"}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Stopping and waiting for caches"}
W0623 22:38:19.347174       1 reflector.go:462] pkg/mod/k8s.io/client-go@v0.29.4/tools/cache/reflector.go:229: watch of *v1.LogicalVolume ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Stopping and waiting for HTTP servers"}
{"level":"info","ts":"2024-06-23T22:38:19Z","logger":"controller-runtime.metrics","msg":"Shutting down metrics server with timeout of 1 minute"}
{"level":"info","ts":"2024-06-23T22:38:19Z","msg":"Wait completed, proceeding to shutdown the manager"}
{"level":"error","ts":"2024-06-23T22:38:19Z","logger":"setup","msg":"problem running manager","error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /run/topolvm/lvmd.sock: connect: no such file or directory\"","stacktrace":"github.com/topolvm/topolvm/cmd/topolvm-node/app.subMain\n\t/workdir/cmd/topolvm-node/app/run.go:168\ngithub.com/topolvm/topolvm/cmd/topolvm-node/app.init.func1\n\t/workdir/cmd/topolvm-node/app/root.go:40\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\ngithub.com/topolvm/topolvm/cmd/topolvm-node/app.Execute\n\t/workdir/cmd/topolvm-node/app/root.go:47\nmain.main\n\t/workdir/cmd/hypertopolvm/main.go:42\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271"}
Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/topolvm/lvmd.sock: connect: no such file or directory"
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/topolvm/lvmd.sock: connect: no such file or directory"
vagrant@debian12:~$

the unix socket is at /run/topolvm/lvmd.sock, and look in the kind cluster yaml config this path is mentioned: https://github.com/topolvm/topolvm/blob/f6b7b2e45f4798497b0a3fb590aac8c2a026622c/example/kind/topolvm-cluster.yaml#L21C27-L21C34

Jean-Baptiste-Lasselle commented 4 months ago

Okay we're almost ther, its only the lvmd pods that are still complaining the /sbin/lvm executable is not found (will it be solved if i set lvmd.managed to false for the helm chart values ?):

vagrant@debian12:~$ kubectl -n topolvm-system get all
NAME                                                   READY   STATUS             RESTARTS        AGE
pod/topolvm-cert-manager-657b7864b7-hdr55              1/1     Running            0               22m
pod/topolvm-cert-manager-cainjector-57fbb46b78-ctjbg   1/1     Running            0               22m
pod/topolvm-cert-manager-startupapicheck-mjz8g         0/1     Completed          0               22m
pod/topolvm-cert-manager-webhook-85bff86bcc-zwlmc      1/1     Running            0               22m
pod/topolvm-controller-5dd4b498d9-6rjsg                5/5     Running            0               22m
pod/topolvm-controller-5dd4b498d9-9cwrj                5/5     Running            0               22m
pod/topolvm-lvmd-0-2r5h4                               0/1     CrashLoopBackOff   8 (2m27s ago)   22m
pod/topolvm-lvmd-0-9xmsr                               0/1     CrashLoopBackOff   8 (2m46s ago)   22m
pod/topolvm-lvmd-0-pm2jc                               0/1     CrashLoopBackOff   8 (2m43s ago)   22m
pod/topolvm-lvmd-0-rdplf                               0/1     CrashLoopBackOff   8 (2m40s ago)   22m
pod/topolvm-lvmd-0-vrrbm                               0/1     CrashLoopBackOff   8 (2m28s ago)   22m
pod/topolvm-lvmd-0-xhnv4                               0/1     CrashLoopBackOff   8 (2m51s ago)   22m
pod/topolvm-lvmd-0-xznk7                               0/1     CrashLoopBackOff   8 (2m37s ago)   22m
pod/topolvm-node-4jhm9                                 3/3     Running            2 (18m ago)     22m
pod/topolvm-node-6lbsd                                 3/3     Running            2 (18m ago)     22m
pod/topolvm-node-6w9kz                                 3/3     Running            2 (18m ago)     22m
pod/topolvm-node-cz6gt                                 3/3     Running            2 (18m ago)     22m
pod/topolvm-node-jm777                                 3/3     Running            2 (18m ago)     22m
pod/topolvm-node-kw8f9                                 3/3     Running            2 (18m ago)     22m
pod/topolvm-node-rnzzj                                 3/3     Running            2 (18m ago)     22m

NAME                                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/topolvm-cert-manager           ClusterIP   10.96.251.97   <none>        9402/TCP   22m
service/topolvm-cert-manager-webhook   ClusterIP   10.96.73.23    <none>        443/TCP    22m
service/topolvm-controller             ClusterIP   10.96.161.35   <none>        443/TCP    22m

NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/topolvm-lvmd-0   7         7         0       7            0           <none>          22m
daemonset.apps/topolvm-node     7         7         7       7            7           <none>          22m

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/topolvm-cert-manager              1/1     1            1           22m
deployment.apps/topolvm-cert-manager-cainjector   1/1     1            1           22m
deployment.apps/topolvm-cert-manager-webhook      1/1     1            1           22m
deployment.apps/topolvm-controller                2/2     2            2           22m

NAME                                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/topolvm-cert-manager-657b7864b7              1         1         1       22m
replicaset.apps/topolvm-cert-manager-cainjector-57fbb46b78   1         1         1       22m
replicaset.apps/topolvm-cert-manager-webhook-85bff86bcc      1         1         1       22m
replicaset.apps/topolvm-controller-5dd4b498d9                2         2         2       22m

NAME                                             STATUS     COMPLETIONS   DURATION   AGE
job.batch/topolvm-cert-manager-startupapicheck   Complete   1/1           17m        22m
vagrant@debian12:~$
vagrant@debian12:~$ kubectl -n topolvm-system logs pod/topolvm-node-kw8f9
Defaulted container "topolvm-node" out of: topolvm-node, csi-registrar, liveness-probe
{"level":"info","ts":"2024-06-26T18:04:11Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2024-06-26T18:04:11Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-06-26T18:04:11Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":"2024-06-26T18:04:11Z","msg":"Starting EventSource","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","source":"kind source: *v1.LogicalVolume"}
{"level":"info","ts":"2024-06-26T18:04:11Z","msg":"Starting Controller","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume"}
{"level":"info","ts":"2024-06-26T18:04:11Z","msg":"Starting workers","controller":"logicalvolume","controllerGroup":"topolvm.io","controllerKind":"LogicalVolume","worker count":1}
vagrant@debian12:~$
Jean-Baptiste-Lasselle commented 4 months ago

ok, in lvmd.managed=false mode, now topolvm works:

vagrant@debian12:~$ kubectl -n topolvm-system get all
NAME                                                   READY   STATUS      RESTARTS   AGE
pod/topolvm-cert-manager-657b7864b7-5cvnp              1/1     Running     0          29m
pod/topolvm-cert-manager-cainjector-57fbb46b78-2tq8f   1/1     Running     0          29m
pod/topolvm-cert-manager-startupapicheck-vxwsz         0/1     Completed   0          29m
pod/topolvm-cert-manager-webhook-85bff86bcc-pd5mp      1/1     Running     0          29m
pod/topolvm-controller-5dd4b498d9-7zhg6                5/5     Running     0          29m
pod/topolvm-controller-5dd4b498d9-qrhp8                5/5     Running     0          29m
pod/topolvm-node-64lrg                                 3/3     Running     0          29m
pod/topolvm-node-kcrk9                                 3/3     Running     0          29m
pod/topolvm-node-ng5dw                                 3/3     Running     0          29m
pod/topolvm-node-snttw                                 3/3     Running     0          29m
pod/topolvm-node-vhsvd                                 3/3     Running     0          29m
pod/topolvm-node-w6sfs                                 3/3     Running     0          29m
pod/topolvm-node-wmnzw                                 3/3     Running     0          29m

NAME                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/topolvm-cert-manager           ClusterIP   10.96.54.93     <none>        9402/TCP   29m
service/topolvm-cert-manager-webhook   ClusterIP   10.96.177.141   <none>        443/TCP    29m
service/topolvm-controller             ClusterIP   10.96.136.171   <none>        443/TCP    29m

NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/topolvm-node   7         7         7       7            7           <none>          29m

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/topolvm-cert-manager              1/1     1            1           29m
deployment.apps/topolvm-cert-manager-cainjector   1/1     1            1           29m
deployment.apps/topolvm-cert-manager-webhook      1/1     1            1           29m
deployment.apps/topolvm-controller                2/2     2            2           29m

NAME                                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/topolvm-cert-manager-657b7864b7              1         1         1       29m
replicaset.apps/topolvm-cert-manager-cainjector-57fbb46b78   1         1         1       29m
replicaset.apps/topolvm-cert-manager-webhook-85bff86bcc      1         1         1       29m
replicaset.apps/topolvm-controller-5dd4b498d9                2         2         2       29m

NAME                                             STATUS     COMPLETIONS   DURATION   AGE
job.batch/topolvm-cert-manager-startupapicheck   Complete   1/1           9m26s      29m
vagrant@debian12:~$

the question is: what is the managed mode purpose?

Jean-Baptiste-Lasselle commented 4 months ago

about the managed mode, I could eventually try it but probably not wih Kind: