dragonflyoss / nydus

Nydus - the Dragonfly image service, providing fast, secure and easy access to container images.
https://nydus.dev/
Apache License 2.0
1.2k stars 203 forks source link

Problems with Stateful Workloads on Latest Nydus #1174

Closed Champ-Goblem closed 1 year ago

Champ-Goblem commented 1 year ago

We are experiencing some problems with Nydus since at least version 2.1.4 when running some stateful workloads. The main affected workload is MySQL which fails to start correctly when run on Nydus 2.1.4+ along with Kata 3.0.2. The previous version we ran of Nydus was 2.1.0-rc.3 which when running with Kata 3.0.2 works as expected.

The error that MySQL throws during the startup phase:

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/opt/bitnami/mysql/tmp/mysql.sock' (111)

This is unexpected because the provisioning scripts start MySQL in the background and enabling debug mode shows no errors during the startup of the background MySQL.

You can recreate it by following these steps:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kata-installer
spec:
  selector:
    matchLabels:
      app: installer
  template:
    metadata:
      labels:
        app: installer
    spec:
      nodeSelector:
        # Node selector if applicable
      hostPID: true
      volumes:
      - name: bin
        hostPath:
          # CHANGE ME depending on OS
          path: /usr/local/bin
      - name: kata
        hostPath:
          path: /opt/kata
      - name: containerd
        hostPath:
          # CHANGE ME depending on containerd install
          path: /etc/containerd
      containers:
      - name: installer
        image: ubuntu:latest
        env:
        - name: kataReleaseURL
          value: https://github.com/kata-containers/kata-containers/releases/download/3.0.2/kata-static-3.0.2-x86_64.tar.xz
        - name: nydusReleaseURL
          value: https://github.com/dragonflyoss/image-service/releases/download/v2.2.0/nydus-static-v2.2.0-linux-amd64.tgz
        - name: binPath
          # CHANGE ME depending on OS
          value: /usr/local/bin
        - name: containerdPath
          # CHANGE ME depending on containerd install
          value: /etc/containerd/config.toml
        volumeMounts:
        - name: bin
          mountPath: /usr/local/bin
        - name: kata
          mountPath: /opt/kata
        - name: containerd
          mountPath: /etc/containerd
        securityContext:
          privileged: true
        command:
        - bash
        - -c
        args:
        - |
          #!/usr/bin/env bash
          set -e

          apt update && apt install -y wget xz-utils

          cd /tmp

          echo "Installing kata"
          wget --retry-connrefused -t 20 --waitretry=1 ${kataReleaseURL} -O /tmp/kata.tar.xz
          tar -xf /tmp/kata.tar.xz -C /
          cp /opt/kata/bin/containerd-shim-kata-v2 /usr/local/bin/ --force

          echo "Installing nydus"
          wget --retry-connrefused -t 20 --waitretry=1 ${nydusReleaseURL} -O /tmp/nydus.tar.gz
          tar -xzf /tmp/nydus.tar.gz -C /tmp/
          cp -r /tmp/nydus-static/* /usr/local/bin/ --force

          CONTAINERD_CRI_TAG="cri"
          if grep -E -q "^version = 2$" ${containerdPath}; then
            CONTAINERD_CRI_TAG="\"io.containerd.grpc.v1.cri\""
          fi

          if ! grep -q kata ${containerdPath}; then
            echo "
            [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata]
              runtime_type = \"io.containerd.kata.v2\"
              privileged_without_host_devices = true
              [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata.options]
                ConfigPath = \"/opt/kata/share/defaults/kata-containers/configuration.toml\"
            [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-fc]
              runtime_type = \"io.containerd.kata.v2\"
              privileged_without_host_devices = true
              [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-fc.options]
                ConfigPath = \"/opt/kata/share/defaults/kata-containers/configuration-fc.toml\"
            [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-qemu]
              runtime_type = \"io.containerd.kata.v2\"
              privileged_without_host_devices = true
              [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-qemu.options]
                ConfigPath = \"/opt/kata/share/defaults/kata-containers/configuration-qemu.toml\"
            [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-clh]
              runtime_type = \"io.containerd.kata.v2\"
              privileged_without_host_devices = true
              [plugins.${CONTAINERD_CRI_TAG}.containerd.runtimes.kata-clh.options]
                ConfigPath = \"/opt/kata/share/defaults/kata-containers/configuration-clh.toml\"" >> ${containerdPath}
          fi

          # Workaround for nydus 2.2.0
          echo '#!/bin/bash
          args=`echo $@ | sed '"'"'s#--hybrid-mode##'"'"'`' > ${binPath}/nydus-helper.sh
          echo "${binPath}/nydusd \$args" >> ${binPath}/nydus-helper.sh

          chmod +x ${binPath}/nydus-helper.sh

          nsenter -t 1 -m -p -- systemctl restart containerd

          echo 'Enabling nydus virtiofs'
          sed -i 's#virtio_fs_extra_args.*#virtio_fs_extra_args = []#' /opt/kata/share/defaults/kata-containers/configuration*
          sed -i 's#shared_fs.*#shared_fs = "virtio-fs-nydus"#' /opt/kata/share/defaults/kata-containers/configuration*
          sed -i "s#virtio_fs_daemon.*#virtio_fs_daemon = \"${binPath}/nydus-helper.sh\"#" /opt/kata/share/defaults/kata-containers/configuration*

          echo 'Setting sandbox_cgroup_only to false'
          sed -i 's/sandbox_cgroup_only=.\+$/sandbox_cgroup_only=false/' /opt/kata/share/defaults/kata-containers/configuration*

          echo "Done"

          while :; do sleep 50000; done
architecture: replication
auth:
  rootPassword: "password"
  username: "user1"
  password: "password"
  replicationPassword: "password"

image:
  debug: false

primary:
  runtimeClassName: kata-clh
  nodeSelector:
    # set if required
  initContainers:
  - command:
    - /bin/bash
    - -ec
    - |
      chown -R 1001:1001 /bitnami/mysql
    image: docker.io/bitnami/minideb:buster
    imagePullPolicy: Always
    name: volume-permissions
    resources: {}
    securityContext:
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /bitnami/mysql
      name: data
secondary:
  replicaCount: 0

initdbScripts:
  test.sh: |
    mysql -P 3306 -uroot -ppassword -e "SHOW STATUS;"

To note, if you want to helm uninstall mysql then helm install again, please delete the persistent volume claim between the uninstall and install steps.

When you run the above MySQL in runc, the command in test.sh under the initdbScripts will print the server status, this works because the Bitnami scripts start MySQL during the init phase in the background, allowing the scripts to make changes to the configs. Whereas when run with Kata and the latest Nydus we see that the same command fails to connect to MySQL over the socket present at /opt/bitnami/mysql/tmp/mysql.sock even though MySQL is running correctly. You may view the logs for MySQL running in the background by setting image.debug=true in the helm values.

To get an idea of how stable the newest version of Nydus is, we tried running the xfstest utility against it https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/. For nydus 2.1.0-rc3 this yielded:

Failures: generic/007 generic/013 generic/088 generic/131 generic/245 generic/247 generic/257 generic/258 generic/263 generic/430 generic/431 generic/432 generic/433 generic/434 generic/478 generic/504 generic/564 generic/571 generic/632 generic/637 generic/639
Failed 21 of 589 tests

and for version 2.2.0 the results were:

Failures: generic/007 generic/013 generic/088 generic/245 generic/257 generic/258 generic/263 generic/430 generic/431 generic/432 generic/433 generic/434 generic/504 generic/564 generic/571 generic/632 generic/637 generic/639
Failed 18 of 589 tests

We have made a ticket with the fuse-backend-rs crate with a suggestion about integrating xfstests into their testing regime to try and catch any problems early https://github.com/cloud-hypervisor/fuse-backend-rs/issues/111.

imeoer commented 1 year ago

Hi @Champ-Goblem , thanks for the detailed report, could you try nydusify check --source <your_original_mysql_image> --target <your_nydus_mysql_image> with nydus-image & nydusd 2.1.4 ?

Champ-Goblem commented 1 year ago

Hi @imeoer, the image we are using is standard OCI rather than a RAFS formatted image, the image being used by the helm chart is docker.io/bitnami/mysql:8.0.32-debian-11-r14, does this change things?

hsiangkao commented 1 year ago

it seems this issue is mainly related to FUSE? IOWs, fuse-backend-rs?

Champ-Goblem commented 1 year ago

Seems to be solved in 2.1.6