canonical / microk8s-core-addons

Core MicroK8s addons
Apache License 2.0
42 stars 34 forks source link

Mayastor: unable to start `mayastor` data plane #25

Closed balchua closed 2 years ago

balchua commented 2 years ago

When i enabled the mayastor addon, most of the pods are up, except for the mayastor pod. It crash loops with this error (I turned on the debug)

[2022-04-10T05:56:58.466044674+00:00  INFO mayastor:mayastor.rs:94] free_pages: 1024 nr_pages: 1024
[2022-04-10T05:56:58.466203487+00:00  INFO mayastor:mayastor.rs:133] Starting Mayastor version: v1.0.0-119-ge5475575ea3e
[2022-04-10T05:56:58.466318231+00:00  INFO mayastor:mayastor.rs:134] kernel io_uring support: yes
[2022-04-10T05:56:58.466337877+00:00  INFO mayastor:mayastor.rs:138] kernel nvme initiator multipath support: yes
[2022-04-10T05:56:58.466380212+00:00  INFO mayastor::core::env:env.rs:600] loading mayastor config YAML file /var/local/mayastor/config.yaml
[2022-04-10T05:56:58.466398028+00:00 DEBUG mayastor::subsys::config:mod.rs:154] loading configuration file from /var/local/mayastor/config.yaml
[2022-04-10T05:56:58.466418263+00:00  INFO mayastor::subsys::config:mod.rs:168] Config file /var/local/mayastor/config.yaml is empty, reverting to default config
[2022-04-10T05:56:58.466439905+00:00  INFO mayastor::subsys::config::opts:opts.rs:155] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2022-04-10T05:56:58.466462485+00:00  INFO mayastor::subsys::config:mod.rs:216] Applying Mayastor configuration settings
[2022-04-10T05:56:58.466479267+00:00 DEBUG mayastor::subsys::config::opts:opts.rs:259] spdk_bdev_nvme_opts { action_on_timeout: 4, timeout_us: 5000000, timeout_admin_us: 5000000, keep_alive_timeout_ms: 1000, transport_retry_count: 0, arbitration_burst: 0, low_priority_weight: 0, medium_priority_weight: 0, high_priority_weight: 0, nvme_adminq_poll_period_us: 1000, nvme_ioq_poll_period_us: 0, io_queue_requests: 0, delay_cmd_submit: true, bdev_retry_count: 0 }
[2022-04-10T05:56:58.466507936+00:00 DEBUG mayastor::subsys::config:mod.rs:220] Config {
    source: Some(
        "/var/local/mayastor/config.yaml",
    ),
    nvmf_tcp_tgt_conf: NvmfTgtConfig {
        name: "mayastor_target",
        max_namespaces: 110,
        opts: NvmfTcpTransportOpts {
            max_queue_depth: 32,
            max_qpairs_per_ctrl: 32,
            in_capsule_data_size: 4096,
            max_io_size: 131072,
            io_unit_size: 131072,
            max_aq_depth: 32,
            num_shared_buf: 2048,
            buf_cache_size: 64,
            dif_insert_or_strip: false,
            abort_timeout_sec: 1,
            acceptor_poll_rate: 10000,
            zcopy: true,
        },
    },
    nvme_bdev_opts: NvmeBdevOpts {
        action_on_timeout: 4,
        timeout_us: 5000000,
        timeout_admin_us: 5000000,
        keep_alive_timeout_ms: 1000,
        transport_retry_count: 0,
        arbitration_burst: 0,
        low_priority_weight: 0,
        medium_priority_weight: 0,
        high_priority_weight: 0,
        nvme_adminq_poll_period_us: 1000,
        nvme_ioq_poll_period_us: 0,
        io_queue_requests: 0,
        delay_cmd_submit: true,
        bdev_retry_count: 0,
    },
    bdev_opts: BdevOpts {
        bdev_io_pool_size: 65535,
        bdev_io_cache_size: 512,
        small_buf_pool_size: 8191,
        large_buf_pool_size: 1023,
    },
    nexus_opts: NexusOpts {
        nvmf_enable: true,
        nvmf_discovery_enable: true,
        nvmf_nexus_port: 4421,
        nvmf_replica_port: 8420,
    },
}
[2022-04-10T05:56:58.466597007+00:00 DEBUG mayastor::core::env:env.rs:534] EAL arguments ["mayastor", "--no-shconf", "-m 0", "--base-virtaddr=0x200000000000", "--file-prefix=mayastor_pid1", "--huge-unlink", "--log-level=lib.eal:6", "--log-level=lib.cryptodev:5", "--log-level=user1:6", "--match-allocations", "-l 1"]
EAL: No available 1048576 kB hugepages reported
EAL: alloc_pages_on_heap(): couldn't allocate memory due to IOVA exceeding limits of current DMA mask
EAL: alloc_pages_on_heap(): Please try initializing EAL with --iova-mode=pa parameter
EAL: error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
thread 'main' panicked at 'Failed to init EAL', mayastor/src/core/env.rs:543:13
stack backtrace:
   0: std::panicking::begin_panic
   1: mayastor::core::env::MayastorEnvironment::init
   2: mayastor::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

I tried restarting MicroK8s as well as rebooting the host when i change these hugepages.

Hugepages seems to be ok

$ grep HugePages /proc/meminfo
AnonHugePages:    100352 kB
ShmemHugePages:   251904 kB
FileHugePages:         0 kB
HugePages_Total:    1024
HugePages_Free:     1024
HugePages_Rsvd:        0
HugePages_Surp:        0

There is also this instruction that fails for me.


$ sudo apt install linux-modules-extra-$(uname -r)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Package linux-modules-extra-5.16.11-76051611-generic is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'linux-modules-extra-5.16.11-76051611-generic' has no installation candidate

Do you think this error is the culprit?

Thanks for your help!

neoaggelos commented 2 years ago

Do you think this error is the culprit?

Looks like it. What OS/version is this on? Is this a desktop or a server instance?

balchua commented 2 years ago

Ubuntu 20.04 desktop version.

AlexsJones commented 2 years ago

Is this issue still outstanding @balchua @neoaggelos ?

balchua commented 2 years ago

Yes, i haven't tried loading another version of the extra modules.

balchua commented 2 years ago

I tried mayastor addon from a DigitalOcean Droplet and it works. However it fails with this error above on my laptop. I tried installing extras module, but it doesn't seem to take effect. I will also try to cross post this to OpenEBS folks see what they have to say.

lopesdasilva commented 2 years ago

I'm facing the same issue on a new installation of microk8s.

I have 2 machines slightly different.

I've noticed the following: sudo ls -Ahlt /sys/devices/system/node/node*/hugepages Machine 1 returns and do not work drwxr-xr-x 2 root root 0 May 11 21:34 hugepages-1048576kB drwxr-xr-x 2 root root 0 May 11 21:34 hugepages-2048kB Machine 2 returns and works drwxr-xr-x 2 root root 0 May 11 18:27 hugepages-2048kB

and the value 1048576kB is the one that says on the error when the pods tries to start. EAL: No available 1048576 kB hugepages reported I've also tried to set one page on the 1048576kB, and now I got a message on the pod stating that I don't have enough pages, but it found 1 page.

it looks like the mayastor is picking up the 1048576kB instead of the 2048kB even tough I got the following cat /proc/meminfo | grep Huge AnonHugePages: 16384 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 1024 HugePages_Free: 1024 HugePages_Rsvd: 0 HugePages_Surp: 0 ---->Hugepagesize: 2048 kB (thought this was the default one) Hugetlb: 2097152 kB

stygmate commented 2 years ago

any news on that one ? (i'm stuck for the deployment of a microk8s cluster because of that one)

neoaggelos commented 2 years ago

@stygmate @balchua @lopesdasilva Hi, sorry for taking long. It looks to me like the hugepages message is unrelated to the issue at hand. I've been unable to replicate the issue locally, can you try the following:

microk8s.kubectl edit -n mayastor daemonset mayastor

and append the following command-line argument to the mayastor service: - --env-context=--iova-mode=pa

stygmate commented 2 years ago

@neoaggelos seems Mayastor pods running correctly. I will try creating volumes after. what is this option ? is it normal that each pod of the daemon take 100% cpu (1 core) all the time ?

lopesdasilva commented 2 years ago

@neoaggelos with the options you suggested it works.

@stygmate I also have the CPU always on 100% as well, I don't know if that is expected.

stygmate commented 2 years ago

I also have the CPU always on 100% as well, I don't know if that is expected

@lopesdasilva Yes it is ! you can find the detail in mayastor docs.

@neoaggelos what - --env-context=--iova-mode=pa parameter do ?

balchua commented 2 years ago

I confirm that the parameter - --env-context=--iova-mode=pa works! Thanks @neoaggelos

Do you think this is a good default? According to this article, the pa aka Physical Address mode works most of the time regardless of HW or SW.

But using the va mode is preferred.

brentgroves commented 1 year ago

Using microk8s and mayastor addon on 3 nodes which are each running ubuntu 22.04. followed these instructions https://microk8s.io/docs/addon-mayastor And Noticed I almost had enough free pages so I decided to change the suggested: sudo sysctl vm.nr_hugepages=1024 echo 'vm.nr_hugepages=1024' | sudo tee -a /etc/sysctl.conf to: sudo sysctl vm.nr_hugepages=1048 echo 'vm.nr_hugepages=1048' | sudo tee -a /etc/sysctl.conf sudo nvim /etc/sysctl.conf

and that seemed to do the trick.