canonical / microk8s-core-addons

Core MicroK8s addons
Apache License 2.0
43 stars 34 forks source link

Mayastor addon does not automatically create a MayastorPool on all nodes #133

Open uvulpos opened 1 year ago

uvulpos commented 1 year ago

Summary

I have 3x Hetzner Servers where I deployed a HA MicroK8s cluster on it (twice), but when I do install it, I only get one mayastor pool on Node 3, instead of all 3. The even funnier part is, that I executed it on the first node. So at least I would have expected it to have the node 1 pool.

I added the nodes by joining them over the internal hetzner network as a full node (not worker) so I do not see the issue here, because for the rest, I sticked to the documentation

root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME                  NODE    STATUS   CAPACITY      USED   AVAILABLE
microk8s-k8s-3-pool   k8s-3   Online   21449670656   0      21449670656

What Should Happen Instead?

As in the documentation mentioned, a mayastor pool for every node. So in the end:

root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME                  NODE    STATUS   CAPACITY      USED   AVAILABLE
microk8s-k8s-1-pool   k8s-1   Online   21449670656   0      21449670656
microk8s-k8s-2-pool   k8s-2   Online   21449670656   0      21449670656
microk8s-k8s-3-pool   k8s-3   Online   21449670656   0      21449670656

Reproduction Steps

apt update && apt upgrade -y && apt install snapd -y
nano /etc/hosts
snap install microk8s --classic
microk8s status --wait-ready
microk8s add-node
microk8s kubectl get nodes
echo vm.nr_hugepages = 1024 | sudo tee -a /etc/sysctl.d/20-microk8s-hugepages.conf
# — restart —
apt-get install linux-modules-extra-$(uname -r)
modprobe nvme-tcp
echo 'nvme-tcp' | sudo tee -a /etc/modules-load.d/microk8s-mayastor.conf
# — restart —
microk8s status
microk8s enable dashboard dns registry istio
microk8s enable ingress
microk8s enable mayastor
microk8s dashboard-proxy

Introspection Report

Report: inspection-report-20230106_142931.tar.gz

Can you suggest a fix?

Maybe a retry command? Or a fix-health command?

microk8s addon mayastor recreate datapools

Are you interested in contributing with a fix?

I think this is too complicated for me. Possibly an easier task next time 🙂

neoaggelos commented 1 year ago

Hi @uvulpos, sorry for missing this.

In the past, we have seen this being an issue due to calico's offloading (which results in the init containers of the mayastor pods failing to resolve DNS and the service failing to start).

The mayastor pools themselves are created by the mayastor pods when they first come up. If the environment is still around (or if you can replicate it, can you see whether running the command below improves things?

sudo microk8s kubectl patch felixconfigurations default --patch '{"spec":{"featureDetectOverride":"ChecksumOffloadBroken=true"}}' --type=merge

Thanks!

uvulpos commented 1 year ago

The environment itself is not around anymore because too much time passed, but I think this is straightforward reproducible. Would try it out on the weekend and give you some feedback afterward. Thank you for your input 🙂 👍🏻

neoaggelos commented 1 year ago

Great, thank you. Please try it out and report any issues here! Happy to help further.

Also, for an easy way to manage pools after deployment, there is a helper script. The examples in the documentation should help you with getting up and running.

uvulpos commented 1 year ago

I made some new discoveries:

  1. In the documentation of mayastor is a mistake, because the command change to add instead of create:
    # create a mayastor pool using `/dev/sdb` on node `uk8s-1`
    sudo snap run --shell microk8s -c '
    $SNAP_COMMON/addons/core/addons/mayastor/pools.py create --node uk8s-1 --device /dev/sdb
    '
  2. With the provided patch, I got now 2/3 Nodes working, the third I still had to create manually (which is weird)
  3. By creating the node, after 10 minutes it was still in creation mode, but the second thing was that the name was odd. I would assume that when I create another node storage, it would be named like the others.

See my mayastor log below:

root@k8s-1:~# microk8s.kubectl logs -n mayastor daemonset/mayastor
Found 3 pods, using pod/mayastor-cxztj
Defaulted container "mayastor" out of: mayastor, registration-probe (init), etcd-probe (init), initialize-pool (init)
[2023-01-21T14:44:50.948703273+00:00  INFO mayastor:mayastor.rs:94] free_pages: 1024 nr_pages: 1024
[2023-01-21T14:44:50.949984844+00:00  INFO mayastor:mayastor.rs:133] Starting Mayastor version: v1.0.0-119-ge5475575ea3e
[2023-01-21T14:44:50.950557012+00:00  INFO mayastor:mayastor.rs:134] kernel io_uring support: yes
[2023-01-21T14:44:50.950567802+00:00  INFO mayastor:mayastor.rs:138] kernel nvme initiator multipath support: nvme not loaded
[2023-01-21T14:44:50.951435360+00:00  INFO mayastor::core::env:env.rs:600] loading mayastor config YAML file /var/local/mayastor/config.yaml
[2023-01-21T14:44:50.951472889+00:00  INFO mayastor::subsys::config:mod.rs:168] Config file /var/local/mayastor/config.yaml is empty, reverting to default config
[2023-01-21T14:44:50.951964657+00:00  INFO mayastor::subsys::config::opts:opts.rs:155] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2023-01-21T14:44:50.951988010+00:00  INFO mayastor::subsys::config:mod.rs:216] Applying Mayastor configuration settings
EAL: No available 1048576 kB hugepages reported
TELEMETRY: No legacy callbacks, legacy socket not created
[2023-01-21T14:44:51.074370660+00:00  INFO mayastor::core::mempool:mempool.rs:50] Memory pool 'bdev_io_ctx' with 65535 elements (24 bytes size) successfully created
[2023-01-21T14:44:51.076425673+00:00  INFO mayastor::core::mempool:mempool.rs:50] Memory pool 'nvme_ctrl_io_ctx' with 65535 elements (72 bytes size) successfully created
[2023-01-21T14:44:51.076449858+00:00  INFO mayastor::core::env:env.rs:649] Total number of cores available: 1
[2023-01-21T14:44:51.085807021+00:00  INFO mayastor::core::reactor:reactor.rs:175] scheduled init_thread 0x55efd6211510 on core:1
[2023-01-21T14:44:51.085934129+00:00  INFO mayastor::core::reactor:reactor.rs:151] Init thread ID 1
[2023-01-21T14:44:51.085976268+00:00  INFO mayastor::core::env:env.rs:678] All cores locked and loaded!
[2023-01-21T14:44:51.269147140+00:00  INFO mayastor::bdev::nexus::nexus_module:nexus_module.rs:36] Initializing Nexus CAS Module
[2023-01-21T14:44:51.270017934+00:00  INFO mayastor::core::reactor:reactor.rs:175] scheduled mayastor_nvmf_tcp_pg_core_1 0x55efd6214020 on core:1
[2023-01-21T14:44:51.321090421+00:00  INFO mayastor::subsys::nvmf::target:target.rs:262] nvmf target listening on 10.0.0.4:(4421,8420)
[2023-01-21T14:44:51.322676249+00:00  INFO mayastor::subsys::nvmf::target:target.rs:353] nvmf target accepting new connections and is ready to roll..💃
[2023-01-21T14:44:51.357654033+00:00  INFO mayastor::core::reactor:reactor.rs:175] scheduled iscsi_poll_group_1 0x55efd624c170 on core:1
[2023-01-21T14:44:51.357708956+00:00  INFO mayastor::core::env:env.rs:584] RPC server listening at: /var/tmp/mayastor.sock
[2023-01-21T14:44:51.405666513+00:00  INFO mayastor::persistent_store:persistent_store.rs:82] Connected to etcd on endpoint etcd-client:2379
[2023-01-21T14:44:51.405781196+00:00  INFO mayastor::subsys::registration::registration_grpc:registration_grpc.rs:153] Registering '"k8s-2"' with grpc server 10.0.0.4:10124 ...
[2023-01-21T14:44:51.405906780+00:00  INFO mayastor::grpc::server:server.rs:33] gRPC server configured at address 10.0.0.4:10124
[2023-01-21T14:44:51.741712356+00:00 ERROR mayastor::lvs::lvs_pool:lvs_pool.rs:295] error=failed to import pool microk8s-k8s-2-pool
[2023-01-21T14:44:51.754646482+00:00  INFO mayastor::lvs::lvs_pool:lvs_pool.rs:412] The pool 'microk8s-k8s-2-pool' has been created on /data/microk8s.img
root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME                  NODE    STATUS     CAPACITY      USED   AVAILABLE
microk8s-k8s-2-pool   k8s-2   Online     21449670656   0      21449670656
microk8s-k8s-1-pool   k8s-1   Online     21449670656   0      21449670656
pool-k8s-3-sdb        k8s-3   Creating   0             0      0
root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME                  NODE    STATUS     CAPACITY      USED   AVAILABLE
microk8s-k8s-2-pool   k8s-2   Online     21449670656   0      21449670656
microk8s-k8s-1-pool   k8s-1   Online     21449670656   0      21449670656
pool-k8s-3-sdb        k8s-3   Creating   0             0      0
root@k8s-1:~# date
Sat Jan 21 14:56:30 UTC 2023
root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME                  NODE    STATUS     CAPACITY      USED   AVAILABLE
microk8s-k8s-2-pool   k8s-2   Online     21449670656   0      21449670656
microk8s-k8s-1-pool   k8s-1   Online     21449670656   0      21449670656
pool-k8s-3-sdb        k8s-3   Creating   0             0      0
root@k8s-1:~# 

So I think this isn't fixed yet 😕

uvulpos commented 1 year ago

Update: As I said

root@k8s-1:~# microk8s.kubectl get mayastorpool -n mayastor
NAME                  NODE    STATUS   CAPACITY      USED   AVAILABLE
microk8s-k8s-2-pool   k8s-2   Online   21449670656   0      21449670656
microk8s-k8s-1-pool   k8s-1   Online   21449670656   0      21449670656
pool-k8s-3-sdb        k8s-3   Error    0             0      0