Open uvulpos opened 1 year ago
Hi @uvulpos, sorry for missing this.
In the past, we have seen this being an issue due to calico's offloading (which results in the init containers of the mayastor pods failing to resolve DNS and the service failing to start).
The mayastor pools themselves are created by the mayastor pods when they first come up. If the environment is still around (or if you can replicate it, can you see whether running the command below improves things?
sudo microk8s kubectl patch felixconfigurations default --patch '{"spec":{"featureDetectOverride":"ChecksumOffloadBroken=true"}}' --type=merge
Thanks!
The environment itself is not around anymore because too much time passed, but I think this is straightforward reproducible. Would try it out on the weekend and give you some feedback afterward. Thank you for your input 🙂 👍🏻
Great, thank you. Please try it out and report any issues here! Happy to help further.
Also, for an easy way to manage pools after deployment, there is a helper script. The examples in the documentation should help you with getting up and running.
I made some new discoveries:
add
instead of create
:
# create a mayastor pool using `/dev/sdb` on node `uk8s-1`
sudo snap run --shell microk8s -c '
$SNAP_COMMON/addons/core/addons/mayastor/pools.py create --node uk8s-1 --device /dev/sdb
'
See my mayastor log below:
root@k8s-1:~# microk8s.kubectl logs -n mayastor daemonset/mayastor
Found 3 pods, using pod/mayastor-cxztj
Defaulted container "mayastor" out of: mayastor, registration-probe (init), etcd-probe (init), initialize-pool (init)
[2023-01-21T14:44:50.948703273+00:00 INFO mayastor:mayastor.rs:94] free_pages: 1024 nr_pages: 1024
[2023-01-21T14:44:50.949984844+00:00 INFO mayastor:mayastor.rs:133] Starting Mayastor version: v1.0.0-119-ge5475575ea3e
[2023-01-21T14:44:50.950557012+00:00 INFO mayastor:mayastor.rs:134] kernel io_uring support: yes
[2023-01-21T14:44:50.950567802+00:00 INFO mayastor:mayastor.rs:138] kernel nvme initiator multipath support: nvme not loaded
[2023-01-21T14:44:50.951435360+00:00 INFO mayastor::core::env:env.rs:600] loading mayastor config YAML file /var/local/mayastor/config.yaml
[2023-01-21T14:44:50.951472889+00:00 INFO mayastor::subsys::config:mod.rs:168] Config file /var/local/mayastor/config.yaml is empty, reverting to default config
[2023-01-21T14:44:50.951964657+00:00 INFO mayastor::subsys::config::opts:opts.rs:155] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2023-01-21T14:44:50.951988010+00:00 INFO mayastor::subsys::config:mod.rs:216] Applying Mayastor configuration settings
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
[2023-01-21T14:44:51.074370660+00:00 INFO mayastor::core::mempool:mempool.rs:50] Memory pool 'bdev_io_ctx' with 65535 elements (24 bytes size) successfully created
[2023-01-21T14:44:51.076425673+00:00 INFO mayastor::core::mempool:mempool.rs:50] Memory pool 'nvme_ctrl_io_ctx' with 65535 elements (72 bytes size) successfully created
[2023-01-21T14:44:51.076449858+00:00 INFO mayastor::core::env:env.rs:649] Total number of cores available: 1
[2023-01-21T14:44:51.085807021+00:00 INFO mayastor::core::reactor:reactor.rs:175] scheduled init_thread 0x55efd6211510 on core:1
[2023-01-21T14:44:51.085934129+00:00 INFO mayastor::core::reactor:reactor.rs:151] Init thread ID 1
[2023-01-21T14:44:51.085976268+00:00 INFO mayastor::core::env:env.rs:678] All cores locked and loaded!
[2023-01-21T14:44:51.269147140+00:00 INFO mayastor::bdev::nexus::nexus_module:nexus_module.rs:36] Initializing Nexus CAS Module
[2023-01-21T14:44:51.270017934+00:00 INFO mayastor::core::reactor:reactor.rs:175] scheduled mayastor_nvmf_tcp_pg_core_1 0x55efd6214020 on core:1
[2023-01-21T14:44:51.321090421+00:00 INFO mayastor::subsys::nvmf::target:target.rs:262] nvmf target listening on 10.0.0.4:(4421,8420)
[2023-01-21T14:44:51.322676249+00:00 INFO mayastor::subsys::nvmf::target:target.rs:353] nvmf target accepting new connections and is ready to roll..💃
[2023-01-21T14:44:51.357654033+00:00 INFO mayastor::core::reactor:reactor.rs:175] scheduled iscsi_poll_group_1 0x55efd624c170 on core:1
[2023-01-21T14:44:51.357708956+00:00 INFO mayastor::core::env:env.rs:584] RPC server listening at: /var/tmp/mayastor.sock
[2023-01-21T14:44:51.405666513+00:00 INFO mayastor::persistent_store:persistent_store.rs:82] Connected to etcd on endpoint etcd-client:2379
[2023-01-21T14:44:51.405781196+00:00 INFO mayastor::subsys::registration::registration_grpc:registration_grpc.rs:153] Registering '"k8s-2"' with grpc server 10.0.0.4:10124 ...
[2023-01-21T14:44:51.405906780+00:00 INFO mayastor::grpc::server:server.rs:33] gRPC server configured at address 10.0.0.4:10124
[2023-01-21T14:44:51.741712356+00:00 ERROR mayastor::lvs::lvs_pool:lvs_pool.rs:295] error=failed to import pool microk8s-k8s-2-pool
[2023-01-21T14:44:51.754646482+00:00 INFO mayastor::lvs::lvs_pool:lvs_pool.rs:412] The pool 'microk8s-k8s-2-pool' has been created on /data/microk8s.img
root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME NODE STATUS CAPACITY USED AVAILABLE
microk8s-k8s-2-pool k8s-2 Online 21449670656 0 21449670656
microk8s-k8s-1-pool k8s-1 Online 21449670656 0 21449670656
pool-k8s-3-sdb k8s-3 Creating 0 0 0
root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME NODE STATUS CAPACITY USED AVAILABLE
microk8s-k8s-2-pool k8s-2 Online 21449670656 0 21449670656
microk8s-k8s-1-pool k8s-1 Online 21449670656 0 21449670656
pool-k8s-3-sdb k8s-3 Creating 0 0 0
root@k8s-1:~# date
Sat Jan 21 14:56:30 UTC 2023
root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME NODE STATUS CAPACITY USED AVAILABLE
microk8s-k8s-2-pool k8s-2 Online 21449670656 0 21449670656
microk8s-k8s-1-pool k8s-1 Online 21449670656 0 21449670656
pool-k8s-3-sdb k8s-3 Creating 0 0 0
root@k8s-1:~#
So I think this isn't fixed yet 😕
Update: As I said
root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME NODE STATUS CAPACITY USED AVAILABLE
microk8s-k8s-2-pool k8s-2 Online 21449670656 0 21449670656
microk8s-k8s-1-pool k8s-1 Online 21449670656 0 21449670656
pool-k8s-3-sdb k8s-3 Error 0 0 0
Summary
I have 3x Hetzner Servers where I deployed a HA MicroK8s cluster on it (twice), but when I do install it, I only get one mayastor pool on Node 3, instead of all 3. The even funnier part is, that I executed it on the first node. So at least I would have expected it to have the node 1 pool.
I added the nodes by joining them over the internal hetzner network as a full node (not worker) so I do not see the issue here, because for the rest, I sticked to the documentation
What Should Happen Instead?
As in the documentation mentioned, a mayastor pool for every node. So in the end:
Reproduction Steps
Introspection Report
Report: inspection-report-20230106_142931.tar.gz
Can you suggest a fix?
Maybe a retry command? Or a fix-health command?
Are you interested in contributing with a fix?
I think this is too complicated for me. Possibly an easier task next time 🙂