ContainerCraft / Kargo3

KubeVirt Hypervisor | Minimum viable self hosted GitOps PaaS
GNU General Public License v3.0
6 stars 1 forks source link

Explore aarch64 deployment & opportunities for arm/raspi use cases. #8

Closed jbpratt closed 2 years ago

jbpratt commented 2 years ago

Provision a base Fedora RPi4, disable the 4G ram limit, attempt executing kubespray against the new system

jbpratt commented 2 years ago

i-it just worked :tada:

❯ oc get pods -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-6ddb68799c-68rt8   1/1     Running   0          8m7s
kube-system   calico-node-q86mq                          1/1     Running   0          9m23s
kube-system   coredns-8474476ff8-q7btk                   1/1     Running   0          7m14s
kube-system   kube-apiserver-node1                       1/1     Running   0          11m
kube-system   kube-controller-manager-node1              1/1     Running   1          11m
kube-system   kube-multus-ds-c9g6s                       1/1     Running   0          6m33s
kube-system   kube-proxy-b7xts                           1/1     Running   0          11m
kube-system   kube-scheduler-node1                       1/1     Running   1          11m
kube-system   nodelocaldns-87l9j                         1/1     Running   0          6m45s

~ 567ms
❯ oc describe node | grep arm
Labels:             beta.kubernetes.io/arch=arm64
                    kubernetes.io/arch=arm64
  Architecture:               arm64

jbpratt commented 2 years ago

ran into the first of many to come

mister  main [?] 3s
❯ oc get pods -A
NAMESPACE                NAME                                               READY   STATUS    RESTARTS   AGE
cert-manager             cert-manager-57d89b9548-22rxt                      1/1     Running   0          117s
cert-manager             cert-manager-cainjector-5bcf77b697-sw6ft           1/1     Running   0          117s
cert-manager             cert-manager-webhook-9cb88bd6d-r8cvm               1/1     Running   0          117s
cluster-network-addons   cluster-network-addons-operator-549b8f8966-mqmwx   0/1     Error     0          62s
kube-system              calico-kube-controllers-6ddb68799c-68rt8           1/1     Running   0          117m
kube-system              calico-node-q86mq                                  1/1     Running   0          118m
kube-system              coredns-8474476ff8-q7btk                           1/1     Running   0          116m
kube-system              kube-apiserver-node1                               1/1     Running   0          121m
kube-system              kube-controller-manager-node1                      1/1     Running   1          121m
kube-system              kube-multus-ds-c9g6s                               1/1     Running   0          116m
kube-system              kube-proxy-b7xts                                   1/1     Running   0          121m
kube-system              kube-scheduler-node1                               1/1     Running   1          121m
kube-system              nodelocaldns-87l9j                                 1/1     Running   0          116m

mister  main [?] 768ms
❯ k logs pod/cluster-network-addons-operator-549b8f8966-mqmwx -n cluster-network-addons
standard_init_linux.go:228: exec user process caused: exec format error

https://github.com/kubevirt/kubevirt/issues/3558

jbpratt commented 2 years ago

Going to document this a bit better for future reference:

  1. Install and boot an RPi4 running Fedora 34 (https://pagure.io/arm-image-installer)
    sudo arm-image-installer \
    --image /tmp/Fedora-Server-34-1.2.aarch64.raw.xz  \
    --addkey ~/.ssh/id_rsa.pub \
    --resizefs \
    --target rpi4 \
    --media /dev/sda
  2. Configure a user + bridge interface (br0)
  3. Execute Kargo to bring up a cluster against the host(s)
  4. Wait (just long enough to fix a coffee and stretch)
  5. Install the developer arm64 build (https://kubevirt.io/user-guide/operations/installation/#experimental-arm64-developer-builds)
    LATEST=$(curl -L https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/latest-arm64)
    kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-operator-arm64.yaml
    kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-cr-arm64.yaml
  6. KubeVirt is now deployed!
    kubevirt      virt-api-6f994b8b7b-b77q4                  1/1     Running    0          27m
    kubevirt      virt-api-6f994b8b7b-dlr8w                  1/1     Running    0          27m
    kubevirt      virt-controller-7f4bc56796-644zf           1/1     Running    0          27m
    kubevirt      virt-controller-7f4bc56796-tlzql           1/1     Running    0          27m
    kubevirt      virt-handler-rf7fx                         1/1     Running    0          27m
    kubevirt      virt-operator-78f55c7745-tcwms             1/1     Running    0          29m
    kubevirt      virt-operator-78f55c7745-wnzm8             1/1     Running    0          29m

Took the existing fedora-br0 template and changed it to my arm64 image (ghcr.io/jbpratt/qubo/fedora:34-arm64)

fedora-br0-arm64.yml ``` --- # https://github.com/kubevirt/kubevirt/blob/master/docs/cloud-init.md apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: name: fedora-br0 namespace: kargo labels: app: kargo spec: running: true template: spec: evictionStrategy: LiveMigrate nodeSelector: node-role.kubernetes.io/kubevirt: "" domain: clock: utc: {} timer: {} cpu: cores: 1 sockets: 1 threads: 2 model: host-passthrough dedicatedCpuPlacement: false devices: rng: {} autoattachPodInterface: false autoattachSerialConsole: true autoattachGraphicsDevice: true networkInterfaceMultiqueue: false disks: - name: containerdisk bootOrder: 1 disk: bus: virtio - name: cloudinitdisk disk: bus: virtio interfaces: - name: enp1s0 model: virtio bridge: {} features: acpi: enabled: true smm: enabled: true firmware: bootloader: efi: secureBoot: true machine: type: q35 resources: limits: memory: 2G requests: memory: 2G devices.kubevirt.io/kvm: "1" hostname: fedora-br0 networks: - name: enp1s0 multus: networkName: kargo-net-attach-def-br0 terminationGracePeriodSeconds: 0 accessCredentials: - sshPublicKey: source: secret: secretName: kargo-sshpubkey-kc2user propagationMethod: qemuGuestAgent: users: - "kc2user" volumes: - name: containerdisk containerDisk: image: ghcr.io/jbpratt/qubo/fedora:34-arm64 imagePullPolicy: Always - name: cloudinitdisk cloudInitNoCloud: networkData: | version: 2 ethernets: enp1s0: dhcp4: true dhcp6: true dhcp-identifier: mac userData: | #cloud-config hostname: fedora-br0 ssh_pwauth: true disable_root: true chpasswd: list: | kc2user:kc2user expire: False users: - name: kc2user shell: /bin/bash lock_passwd: false sudo: ['ALL=(ALL) NOPASSWD:ALL'] groups: sudo,wheel growpart: mode: auto devices: ['/'] ignore_growroot_disabled: true package_upgrade: true packages: - vim - screenfetch runcmd: - "screenfetch" ```
❯ k apply -f kargo/vm/fedora-br0-arm64.yml
The request is invalid:
* spec.template.spec.domain.machine.type: spec.template.spec.domain.machine.type is not supported: q35 (allowed values: [virt*])
* spec.template.spec.evictionStrategy: LiveMigration feature gate is not enabled

Deleting those bits allows the schema validation to pass, and creating again results in a FailedCreate for the VM.

kargo                    0s          Warning   FailedCreate              virtualmachine/fedora-br0                                             Error creating virtual machine instance: admission webhook "virtualmachineinstances-create-validator.kubevirt.io" denied the request: UEFI secure boot is currently not supported on aarch64 Arch

Let's disable secure boot as well. The diff of the file looks like this now:

diff --git a/kargo/vm/fedora-br0-arm64.yml b/kargo/vm/fedora-br0-arm64.yml
index 8ddf458..435fc92 100644
diff --git a/kargo/vm/fedora-br0-arm64.yml b/kargo/vm/fedora-br0-arm64.yml
index 8ddf458..435fc92 100644
--- a/kargo/vm/fedora-br0-arm64.yml
+++ b/kargo/vm/fedora-br0-arm64.yml
@@ -11,7 +11,6 @@ spec:
   running: true
   template:
     spec:
-      evictionStrategy: LiveMigrate
       nodeSelector:
         node-role.kubernetes.io/kubevirt: ""
       domain:
@@ -42,17 +41,6 @@ spec:
           - name: enp1s0
             model: virtio
             bridge: {}
-        features:
-          acpi:
-            enabled: true
-          smm:
-            enabled: true
-        firmware:
-          bootloader:
-            efi:
-              secureBoot: true
-        machine:
-          type: q35
         resources:

It seems the virtual machine is created successfully, but it isn't being schedule to create an instance.

kargo                    0s          Normal    SuccessfulCreate          virtualmachine/fedora-br0                                             Started the virtual machine by creating the new virtual machine instance fedora-br0

Looking at the logs of virt-controller, it is stuck in a loop trying to schedule

{"component":"virt-controller","level":"info","msg":"TSC Freqency node update status: 0 updated, 0 skipped, 0 errors","pos":"nodetopologyupdater.go:47","timestamp":"2021-11-17T23:53:24.397214Z"}
{"component":"virt-controller","level":"info","msg":"reenqueuing VirtualMachineInstance kargo/fedora-br0","pos":"vmi.go:254","reason":"failed to render launch manifest: Failed to locate network attachment definition kargo/kargo-net-attach-def-br0","timestamp":"2021-11-17T23:54:02.789833Z"}
{"component":"virt-controller","level":"error","msg":"Skipping TSC frequency updates on all nodes","pos":"nodetopologyupdater.go:54","reason":"failed to calculate lowest TSC frequency for nodes: no schedulable node exposes a tsc-frequency","timestamp":"2021-11-17T23:54:27.367711Z"}
{"component":"virt-controller","level":"info","msg":"TSC Freqency node update status: 0 updated, 0 skipped, 0 errors","pos":"nodetopologyupdater.go:47","timestamp":"2021-11-17T23:54:27.367962Z"}

It looks like there aren't many great matches for your search :smiley_cat:

usrbinkat commented 2 years ago

set virtualization to pvm (disable hvm) and re-try

if that works, then we'll know to focus on details around the /dev/kvm device

usrbinkat commented 2 years ago

Failed on image pull

Failed to pull image "ghcr.io/jbpratt/qubo/fedora:34-arm64": rpc error: code = Unknown desc = unable to retrieve auth token: invalid username/password: unauthorized

Steps:

Next Steps: re-try with credentials for ready-to-run aarch64 image

jbpratt commented 2 years ago

Fixed the private image @usrbinkat

jbpratt commented 2 years ago

Going to go ahead and close this exploration issue out. Further work can receive a new ticket.