Yui0013 / Kubespray

0 stars 0 forks source link

オフライン構築で「ingress_nginx_webhook_enabled:true」の際に作成されるJobのPodがImagePullBackOffになり、READYにならない #2

Open Yui0013 opened 2 days ago

Yui0013 commented 2 days ago

What happened?

オフラインでkubespray v2.25.0を利用して「ingress_nginx_webhook_enabled: true」にしてKubernetesクラスタを構築したところ、ingress-nginx-controllerのPodが作成されなかった。

調べてみると、ingress-nginx-admissionのJobのPodがイメージの登録に失敗しているため、ingress-nginxが作成できていなかった。

kubectl get po -n  ingress-nginx
NAME                                   READY   STATUS              RESTARTS   AGE
ingress-nginx-admission-create-6b4sb   0/1     ImagePullBackOff    0          26m
ingress-nginx-admission-patch-5hcxf    0/1     ImagePullBackOff    0          26m
ingress-nginx-controller-567dq         0/1     ContainerCreating   0          27m

詳細を確認したところ、ingress-nginx/kube-webhook-certgen:v1.4.1というイメージがなく、pullできていなかった。

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m16s                default-scheduler  Successfully assigned ingress-nginx/ingress-nginx-admission-patch-5hcxf to worker2
  Normal   Pulling    53s (x4 over 2m16s)  kubelet            Pulling image "192.168.122.155:5000/ingress-nginx/kube-webhook-certgen:v1.4.1"
  Warning  Failed     53s (x4 over 2m16s)  kubelet            Failed to pull image "192.168.122.155:5000/ingress-nginx/kube-webhook-certgen:v1.4.1": rpc error: code = NotFound desc = failed to pull and unpack image "192.168.122.155:5000/ingress-nginx/kube-webhook-certgen:v1.4.1": failed to resolve reference "192.168.122.155:5000/ingress-nginx/kube-webhook-certgen:v1.4.1": 192.168.122.155:5000/ingress-nginx/kube-webhook-certgen:v1.4.1: not found
  Warning  Failed     53s (x4 over 2m16s)  kubelet            Error: ErrImagePull
  Warning  Failed     38s (x6 over 2m16s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    23s (x7 over 2m16s)  kubelet            Back-off pulling image "192.168.122.155:5000/ingress-nginx/kube-webhook-certgen:v1.4.1"

ファイルやイメージは./generate_list.shでリストを作成し、./manage-offline-files.shでNginxコンテナに登録した。その後、manage-offline-container-images.sh でコンテナイメージをダウンロードし、ローカルレジストリに登録した。 https://github.com/kubernetes-sigs/kubespray/blob/master/contrib/offline/README.md

./generate_list.shで作成されたイメージリストを確認してみると、ingress-nginx/kube-webhook-certgen:v1.4.1というイメージが記載されていなかった。

# cat temp/images.list
---
docker.io/mirantis/k8s-netchecker-server:v1.2.2
docker.io/mirantis/k8s-netchecker-agent:v1.2.2
quay.io/coreos/etcd:v3.5.12
quay.io/cilium/cilium:v1.15.4
quay.io/cilium/operator:v1.15.4
quay.io/cilium/hubble-relay:v1.15.4
quay.io/cilium/certgen:v0.1.8
quay.io/cilium/hubble-ui:v0.11.0
quay.io/cilium/hubble-ui-backend:v0.11.0
docker.io/envoyproxy/envoy:v1.22.5
ghcr.io/k8snetworkplumbingwg/multus-cni:v3.8
docker.io/flannel/flannel:v0.22.0
docker.io/flannel/flannel-cni-plugin:v1.1.2
quay.io/calico/node:v3.27.3
quay.io/calico/cni:v3.27.3
quay.io/calico/pod2daemon-flexvol:v3.27.3
quay.io/calico/kube-controllers:v3.27.3
quay.io/calico/typha:v3.27.3
quay.io/calico/apiserver:v3.27.3
docker.io/weaveworks/weave-kube:2.8.1
docker.io/weaveworks/weave-npc:2.8.1
docker.io/kubeovn/kube-ovn:v1.11.5
docker.io/cloudnativelabs/kube-router:v2.0.0
registry.k8s.io/pause:3.9
ghcr.io/kube-vip/kube-vip:v0.8.0
docker.io/library/nginx:1.25.2-alpine
docker.io/library/haproxy:2.8.2-alpine
registry.k8s.io/coredns/coredns:v1.11.1
registry.k8s.io/dns/k8s-dns-node-cache:1.22.28
registry.k8s.io/cpa/cluster-proportional-autoscaler:v1.8.8
docker.io/library/registry:2.8.1
registry.k8s.io/metrics-server/metrics-server:v0.7.0
registry.k8s.io/sig-storage/local-volume-provisioner:v2.5.0
quay.io/external_storage/cephfs-provisioner:v2.1.0-k8s1.11
quay.io/external_storage/rbd-provisioner:v2.1.1-k8s1.11
docker.io/rancher/local-path-provisioner:v0.0.24
registry.k8s.io/ingress-nginx/controller:v1.10.1
docker.io/amazon/aws-alb-ingress-controller:v1.1.9
quay.io/jetstack/cert-manager-controller:v1.13.2
quay.io/jetstack/cert-manager-cainjector:v1.13.2
quay.io/jetstack/cert-manager-webhook:v1.13.2
registry.k8s.io/sig-storage/csi-attacher:v3.3.0
registry.k8s.io/sig-storage/csi-provisioner:v3.0.0
registry.k8s.io/sig-storage/csi-snapshotter:v5.0.0
registry.k8s.io/sig-storage/snapshot-controller:v7.0.2
registry.k8s.io/sig-storage/csi-resizer:v1.3.0
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.4.0
registry.k8s.io/provider-os/cinder-csi-plugin:v1.29.0
docker.io/amazon/aws-ebs-csi-driver:v0.5.0
docker.io/kubernetesui/dashboard:v2.7.0
docker.io/kubernetesui/metrics-scraper:v1.0.8
quay.io/metallb/speaker:v0.13.9
quay.io/metallb/controller:v0.13.9
registry.k8s.io/kube-apiserver:v1.29.5
registry.k8s.io/kube-controller-manager:v1.29.5
registry.k8s.io/kube-scheduler:v1.29.5
registry.k8s.io/kube-proxy:v1.29.5

このため、イメージのダウンロードが行われず、該当イメージがローカルレジストリに登録されていないため、イメージをJobのPodに登録できず、ImagePullBackOffとなり、READYにならなかったものと思われる。

What did you expect to happen?

JobのPodのingress-nginx-admissionにイメージkube-webhook-certgen:v1.4.1がpullされ、Podが作成されJobが完了することを期待した。

Environment

OS

Linux 5.14.0-362.8.1.el9_3.x86_64 x86_64
NAME="Red Hat Enterprise Linux"
VERSION="9.3 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.3 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"

Version of Ansible

ansible [core 2.16.11]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.11/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.11.5 (main, Sep  7 2023, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/usr/bin/python3.11)
  jinja version = 3.1.4
  libyaml = True

Version of Python

Python 3.9.18

Version of Kubespray (commit)

5319ca7

Network plugin used

Calico

Full inventory with variables

controlplane1 | SUCCESS => {
    "hostvars[inventory_hostname]": {
        "allow_unsupported_distribution_setup": false,
        "ansible_check_mode": false,
        "ansible_config_file": "/root/k8s-upgrade/kubespray-2.25.0/ansible.cfg",
        "ansible_diff_mode": false,
        "ansible_facts": {},
        "ansible_forks": 5,
        "ansible_host": "192.168.122.111",
        "ansible_inventory_sources": [
            "/root/k8s-upgrade/kubespray-2.25.0/inventory/mycluster/inventory.ini"
        ],
        "ansible_playbook_python": "/usr/bin/python3.11",
        "ansible_verbosity": 0,
        "ansible_version": {
            "full": "2.16.11",
            "major": 2,
            "minor": 16,
            "revision": 11,
            "string": "2.16.11"
        },
        "argocd_enabled": false,
        "auto_renew_certificates": false,
        "bin_dir": "/usr/local/bin",
        "calico_cni_name": "k8s-pod-network",
        "calico_crds_download_url": "{{ files_repo }}/github.com/projectcalico/calico/archive/{{ calico_version }}.tar.gz",
        "calico_pool_blocksize": 26,
        "calicoctl_download_url": "{{ files_repo }}/github.com/projectcalico/calico/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}",
        "cephfs_provisioner_enabled": false,
        "cert_manager_enabled": false,
        "cilium_l2announcements": false,
        "ciliumcli_download_url": "{{ files_repo }}/github.com/cilium/cilium-cli/releases/download/{{ cilium_cli_version }}/cilium-linux-{{ image_arch }}.tar.gz",
        "cluster_name": "cluster.local",
        "cni_download_url": "{{ files_repo }}/github.com/containernetworking/plugins/releases/download/{{ cni_version }}/cni-plugins-linux-{{ image_arch }}-{{ cni_version }}.tgz",
        "container_manager": "containerd",
        "containerd_download_url": "{{ files_repo }}/github.com/containerd/containerd/releases/download/v{{ containerd_version }}/containerd-{{ containerd_version }}-linux-{{ image_arch }}.tar.gz",
        "containerd_registries_mirrors": [
            {
                "mirrors": [
                    {
                        "capabilities": [
                            "pull",
                            "resolve"
                        ],
                        "host": "http://192.168.122.155:5000",
                        "skip_verify": true
                    }
                ],
                "prefix": "192.168.122.155:5000"
            }
        ],
        "coredns_k8s_external_zone": "k8s_external.local",
        "credentials_dir": "/root/k8s-upgrade/kubespray-2.25.0/inventory/mycluster/credentials",
        "cri_dockerd_download_url": "{{ files_repo }}/github.com/Mirantis/cri-dockerd/releases/download/v{{ cri_dockerd_version }}/cri-dockerd-{{ cri_dockerd_version }}.{{ image_arch }}.tgz",
        "crictl_download_url": "{{ files_repo }}/github.com/kubernetes-sigs/cri-tools/releases/download/{{ crictl_version }}/crictl-{{ crictl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz",
        "crio_download_url": "{{ files_repo }}/storage.googleapis.com/cri-o/artifacts/cri-o.{{ image_arch }}.{{ crio_version }}.tar.gz",
        "crun_download_url": "{{ files_repo }}/github.com/containers/crun/releases/download/{{ crun_version }}/crun-{{ crun_version }}-linux-{{ image_arch }}",
        "default_kubelet_config_dir": "/etc/kubernetes/dynamic_kubelet_dir",
        "deploy_netchecker": false,
        "dns_domain": "cluster.local",
        "dns_mode": "coredns",
        "docker_bin_dir": "/usr/bin",
        "docker_container_storage_setup": false,
        "docker_daemon_graph": "/var/lib/docker",
        "docker_dns_servers_strict": false,
        "docker_image_repo": "192.168.122.155:5000",
        "docker_iptables_enabled": "false",
        "docker_log_opts": "--log-opt max-size=50m --log-opt max-file=5",
        "docker_rpm_keepcache": 1,
        "enable_coredns_k8s_endpoint_pod_names": false,
        "enable_coredns_k8s_external": false,
        "enable_dual_stack_networks": false,
        "enable_nat_default_gateway": true,
        "enable_nodelocaldns": true,
        "enable_nodelocaldns_secondary": false,
        "etcd_data_dir": "/var/lib/etcd",
        "etcd_deployment_type": "host",
        "etcd_download_url": "{{ files_repo }}/github.com/etcd-io/etcd/releases/download/{{ etcd_version }}/etcd-{{ etcd_version }}-linux-{{ image_arch }}.tar.gz",
        "event_ttl_duration": "1h0m0s",
        "files_repo": "http://192.168.122.155:8080",
        "gcr_image_repo": "192.168.122.155:5000",
        "group_names": [
            "etcd",
            "k8s_cluster",
            "kube_control_plane"
        ],
        "groups": {
            "all": [
                "controlplane1",
                "worker1"
            ],
            "calico_rr": [],
            "etcd": [
                "controlplane1"
            ],
            "k8s_cluster": [
                "controlplane1",
                "worker1"
            ],
            "kube_control_plane": [
                "controlplane1"
            ],
            "kube_node": [
                "worker1"
            ],
            "ungrouped": []
        },
        "gvisor_containerd_shim_runsc_download_url": "{{ files_repo }}/storage.googleapis.com/gvisor/releases/release/{{ gvisor_version }}/{{ ansible_architecture }}/containerd-shim-runsc-v1",
        "gvisor_runsc_download_url": "{{ files_repo }}/storage.googleapis.com/gvisor/releases/release/{{ gvisor_version }}/{{ ansible_architecture }}/runsc",
        "helm_download_url": "{{ files_repo }}/get.helm.sh/helm-{{ helm_version }}-linux-{{ image_arch }}.tar.gz",
        "helm_enabled": false,
        "ingress_alb_enabled": false,
        "ingress_nginx_enabled": true,
        "ingress_nginx_webhook_enabled": true,
        "ingress_publish_status_address": "",
        "inventory_dir": "/root/k8s-upgrade/kubespray-2.25.0/inventory/mycluster",
        "inventory_file": "/root/k8s-upgrade/kubespray-2.25.0/inventory/mycluster/inventory.ini",
        "inventory_hostname": "controlplane1",
        "inventory_hostname_short": "controlplane1",
        "k8s_image_pull_policy": "IfNotPresent",
        "kata_containers_download_url": "{{ files_repo }}/github.com/kata-containers/kata-containers/releases/download/{{ kata_containers_version }}/kata-static-{{ kata_containers_version }}-{{ ansible_architecture }}.tar.xz",
        "kata_containers_enabled": false,
        "krew_download_url": "{{ files_repo }}/github.com/kubernetes-sigs/krew/releases/download/{{ krew_version }}/krew-{{ host_os }}_{{ image_arch }}.tar.gz",
        "krew_enabled": false,
        "krew_root_dir": "/usr/local/krew",
        "kube_api_anonymous_auth": true,
        "kube_apiserver_ip": "10.233.0.1",
        "kube_apiserver_port": 6443,
        "kube_cert_dir": "/etc/kubernetes/ssl",
        "kube_cert_group": "kube-cert",
        "kube_config_dir": "/etc/kubernetes",
        "kube_encrypt_secret_data": false,
        "kube_image_repo": "192.168.122.155:5000",
        "kube_log_level": 2,
        "kube_manifest_dir": "/etc/kubernetes/manifests",
        "kube_network_node_prefix": 24,
        "kube_network_node_prefix_ipv6": 120,
        "kube_network_plugin": "calico",
        "kube_network_plugin_multus": false,
        "kube_ovn_default_gateway_check": true,
        "kube_ovn_default_logical_gateway": false,
        "kube_ovn_default_vlan_id": 100,
        "kube_ovn_dpdk_enabled": false,
        "kube_ovn_enable_external_vpc": true,
        "kube_ovn_enable_lb": true,
        "kube_ovn_enable_np": true,
        "kube_ovn_enable_ssl": false,
        "kube_ovn_encap_checksum": true,
        "kube_ovn_external_address": "8.8.8.8",
        "kube_ovn_external_address_ipv6": "2400:3200::1",
        "kube_ovn_external_dns": "alauda.cn",
        "kube_ovn_hw_offload": false,
        "kube_ovn_ic_autoroute": true,
        "kube_ovn_ic_dbhost": "127.0.0.1",
        "kube_ovn_ic_enable": false,
        "kube_ovn_ic_zone": "kubernetes",
        "kube_ovn_network_type": "geneve",
        "kube_ovn_node_switch_cidr": "100.64.0.0/16",
        "kube_ovn_node_switch_cidr_ipv6": "fd00:100:64::/64",
        "kube_ovn_pod_nic_type": "veth_pair",
        "kube_ovn_traffic_mirror": false,
        "kube_ovn_tunnel_type": "geneve",
        "kube_ovn_vlan_name": "product",
        "kube_owner": "kube",
        "kube_pods_subnet": "10.233.64.0/18",
        "kube_pods_subnet_ipv6": "fd85:ee78:d8a6:8607::1:0000/112",
        "kube_proxy_mode": "ipvs",
        "kube_proxy_nodeport_addresses": [],
        "kube_proxy_strict_arp": false,
        "kube_script_dir": "/usr/local/bin/kubernetes-scripts",
        "kube_service_addresses": "10.233.0.0/18",
        "kube_service_addresses_ipv6": "fd85:ee78:d8a6:8607::1000/116",
        "kube_token_dir": "/etc/kubernetes/tokens",
        "kube_version": "v1.29.5",
        "kube_vip_enabled": false,
        "kube_webhook_token_auth": false,
        "kube_webhook_token_auth_url_skip_tls_verify": false,
        "kubeadm_certificate_key": "deaffd1ece61feacd56b6608c3b8dbd647dfeb5f89da3c8baf65dbece5cbe2d9",
        "kubeadm_download_url": "{{ files_repo }}/dl.k8s.io/release/{{ kubeadm_version }}/bin/linux/{{ image_arch }}/kubeadm",
        "kubeadm_patches": {
            "dest_dir": "/etc/kubernetes/patches",
            "enabled": false,
            "source_dir": "/root/k8s-upgrade/kubespray-2.25.0/inventory/mycluster/patches"
        },
        "kubectl_download_url": "{{ files_repo }}/dl.k8s.io/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubectl",
        "kubelet_download_url": "{{ files_repo }}/dl.k8s.io/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubelet",
        "kubernetes_audit": false,
        "loadbalancer_apiserver_healthcheck_port": 8081,
        "loadbalancer_apiserver_port": 6443,
        "local_path_provisioner_enabled": false,
        "local_release_dir": "/tmp/releases",
        "local_volume_provisioner_enabled": false,
        "macvlan_interface": "eth1",
        "metallb_enabled": false,
        "metallb_namespace": "metallb-system",
        "metallb_speaker_enabled": false,
        "metrics_server_enabled": false,
        "ndots": 2,
        "nerdctl_download_url": "{{ files_repo }}/github.com/containerd/nerdctl/releases/download/v{{ nerdctl_version }}/nerdctl-{{ nerdctl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz",
        "no_proxy": "localhost,10.233.0.1,192.168.122.0/24",
        "no_proxy_exclude_workers": false,
        "node_feature_discovery_enabled": false,
        "nodelocaldns_bind_metrics_host_ip": false,
        "nodelocaldns_health_port": 9254,
        "nodelocaldns_ip": "169.254.25.10",
        "nodelocaldns_second_health_port": 9256,
        "nodelocaldns_secondary_skew_seconds": 5,
        "ntp_enabled": false,
        "ntp_manage_config": false,
        "ntp_servers": [
            "0.pool.ntp.org iburst",
            "1.pool.ntp.org iburst",
            "2.pool.ntp.org iburst",
            "3.pool.ntp.org iburst"
        ],
        "omit": "__omit_place_holder__35334dcd9e690128ea085b95683a9fe3a4607e07",
        "persistent_volumes_enabled": false,
        "playbook_dir": "/root/k8s-upgrade/kubespray-2.25.0",
        "quay_image_repo": "192.168.122.155:5000",
        "rbd_provisioner_enabled": false,
        "registry_enabled": false,
        "registry_host": "192.168.122.155:5000",
        "remove_anonymous_access": false,
        "resolvconf_mode": "host_resolvconf",
        "retry_stagger": 5,
        "runc_download_url": "{{ files_repo }}/github.com/opencontainers/runc/releases/download/{{ runc_version }}/runc.{{ image_arch }}",
        "skopeo_download_url": "{{ files_repo }}/github.com/lework/skopeo-binary/releases/download/{{ skopeo_version }}/skopeo-linux-{{ image_arch }}",
        "skydns_server": "10.233.0.3",
        "skydns_server_secondary": "10.233.0.4",
        "unsafe_show_logs": false,
        "volume_cross_zone_attachment": false
    }
}

Command used to invoke ansible

# ansible-playbook -i inventory/mycluster/inventory.ini -b cluster.yml 2>&1 | tee k8s-cluster-install.log

Output of ansible run

controlplane1              : ok=634  changed=139  unreachable=0    failed=0    skipped=1094 rescued=0    ignored=6   
worker1                    : ok=430  changed=85   unreachable=0    failed=0    skipped=669  rescued=0    ignored=1 

Anything else we need to know

./generate_list.shは/roles/kubespray-defaults/defaults/main/download.ymlの「downloads:」配下をもとにイメージリストを作成している。

#!/bin/bash
set -eo pipefail

CURRENT_DIR=$(cd $(dirname $0); pwd)
TEMP_DIR="${CURRENT_DIR}/temp"
REPO_ROOT_DIR="${CURRENT_DIR%/contrib/offline}"

: ${DOWNLOAD_YML:="roles/kubespray-defaults/defaults/main/download.yml"}

mkdir -p ${TEMP_DIR}

# generate all download files url template
grep 'download_url:' ${REPO_ROOT_DIR}/${DOWNLOAD_YML} \
    | sed 's/^.*_url: //g;s/\"//g' > ${TEMP_DIR}/files.list.template

# generate all images list template
sed -n '/^downloads:/,/download_defaults:/p' ${REPO_ROOT_DIR}/${DOWNLOAD_YML} \
    | sed -n "s/repo: //p;s/tag: //p" | tr -d ' ' \
    | sed 'N;s#\n# #g' | tr ' ' ':' | sed 's/\"//g' > ${TEMP_DIR}/images.list.template

そのため、download.ymlに以下を追記したところ、イメージリストにregistry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.4.1が追加され、上記の事象が解消された。

# vi /root/k8s-upgrade/kubespray-2.25.0/roles/kubespray-defaults/defaults/main/download.yml
---
downloads:
  ingress_nginx_kube_webhook_certgen:
    repo: "{{ ingress_nginx_kube_webhook_certgen_image_repo }}"
    tag: "{{ ingress_nginx_kube_webhook_certgen_image_tag }}"
    sha256: "{{ ingress_nginx_kube_webhook_certgen_digest_checksum | default(None) }}"
    groups:
      - kube_node
    when: ingress_nginx_webhook_enabled
azkaoru commented 23 hours ago

@Yui0013 kubesprayのgithubのページに、forkボタンがあるのでそこを押すと、自分のリポジトリにkubesprayの複製ができます。その複製の自分のリポジトリにissueを登録しましょうか