confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
44 stars 71 forks source link

failed to create s390x libvirt cluster using kcli #1835

Closed lysliu closed 1 month ago

lysliu commented 1 month ago

When I follow the guide https://github.com/confidential-containers/cloud-api-adaptor/tree/main/src/cloud-api-adaptor/libvirt#create-the-kubernetes-cluster try to create libvirt cluster I got error as

~/src/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor# ./libvirt/kcli_cluster.sh create
Download ubuntu2204 s390x image
Using pool default
Grabbing image ubuntu2204 from url https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-s390x.img
Image ubuntu2204 already there.Leaving...
Using 192.168.122.253 as api_ip
Using keepalived virtual_router_id 15
Using version v1.26.7
Deploying Vms...
Traceback (most recent call last):
  File "/usr/local/bin/kcli", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.12/site-packages/kvirt/cli.py", line 5512, in cli
    args.func(args)
  File "/usr/local/lib/python3.12/site-packages/kvirt/cli.py", line 1887, in create_generic_kube
    create_kube(args)
  File "/usr/local/lib/python3.12/site-packages/kvirt/cli.py", line 1866, in create_kube
    result = config.create_kube(cluster, kubetype, overrides=overrides)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kvirt/config.py", line 2688, in create_kube
    result = self.create_kube_generic(cluster, overrides=overrides)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kvirt/config.py", line 2713, in create_kube_generic
    return kubeadm.create(self, plandir, cluster, overrides)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kvirt/cluster/kubeadm/__init__.py", line 219, in create
    result = config.plan(plan, inputfile=f'{plandir}/bootstrap.yml', overrides=data)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kvirt/config.py", line 2157, in plan
    result = self.create_vm(name, profilename, overrides=currentoverrides, customprofile=profile, k=z,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kvirt/config.py", line 478, in create_vm
    full_volumes = self.k.volumes()
                   ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kvirt/providers/kvm/__init__.py", line 1980, in volumes
    if self.get_capabilities()['arch'] == 'aarch64':
       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kvirt/providers/kvm/__init__.py", line 188, in get_capabilities
    cpuxml = self.conn.baselineCPU([ET.tostring(cpu, encoding='unicode')], 1)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/libvirt.py", line 4288, in baselineCPU
    raise libvirtError('virConnectBaselineCPU() failed')
libvirt.libvirtError: XML error: Missing CPU model name
lysliu commented 1 month ago

I also tried to set cpumodel follow https://kcli.readthedocs.io/en/latest/#available-parameters-for-client-profile-plan-files still no lucky.

stevenhorsman commented 1 month ago

@karmab - Hey Karim, I know we got this working on s390x a few months ago. Any idea if something has gone in that might have led to the issues that @lysliu is seeing?

karmab commented 1 month ago

I think it's a consequence of the change done in here to properly detect nested feature I can capture the exception but can you share the output of the following commands?

virsh capabilities
virsh capabilities | virsh cpu-baseline --features /dev/stdin
stevenhorsman commented 1 month ago

I think it's a consequence of the change done in here to properly detect nested feature I can capture the exception but can you share the output of the following commands?

virsh capabilities
virsh capabilities | virsh cpu-baseline --features /dev/stdin

This is the output:

# virsh capabilities
<capabilities>

  <host>
    <uuid>1b6fbde4-0fdb-45ee-a5aa-31e6ca618e50</uuid>
    <cpu>
      <arch>s390x</arch>
      <topology sockets='16' dies='1' cores='1' threads='1'/>
      <pages unit='KiB' size='4'/>
      <pages unit='KiB' size='1024'/>
    </cpu>
    <power_management/>
    <iommu support='no'/>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>tcp</uri_transport>
        <uri_transport>rdma</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='1'>
        <cell id='0'>
          <memory unit='KiB'>32954940</memory>
          <pages unit='KiB' size='4'>8238735</pages>
          <pages unit='KiB' size='1024'>0</pages>
          <distances>
            <sibling id='0' value='10'/>
          </distances>
          <cpus num='16'>
            <cpu id='0' socket_id='0' die_id='0' core_id='0' siblings='0'/>
            <cpu id='1' socket_id='1' die_id='0' core_id='1' siblings='1'/>
            <cpu id='2' socket_id='2' die_id='0' core_id='2' siblings='2'/>
            <cpu id='3' socket_id='3' die_id='0' core_id='3' siblings='3'/>
            <cpu id='4' socket_id='4' die_id='0' core_id='4' siblings='4'/>
            <cpu id='5' socket_id='5' die_id='0' core_id='5' siblings='5'/>
            <cpu id='6' socket_id='6' die_id='0' core_id='6' siblings='6'/>
            <cpu id='7' socket_id='7' die_id='0' core_id='7' siblings='7'/>
            <cpu id='8' socket_id='8' die_id='0' core_id='8' siblings='8'/>
            <cpu id='9' socket_id='9' die_id='0' core_id='9' siblings='9'/>
            <cpu id='10' socket_id='10' die_id='0' core_id='10' siblings='10'/>
            <cpu id='11' socket_id='11' die_id='0' core_id='11' siblings='11'/>
            <cpu id='12' socket_id='12' die_id='0' core_id='12' siblings='12'/>
            <cpu id='13' socket_id='13' die_id='0' core_id='13' siblings='13'/>
            <cpu id='14' socket_id='14' die_id='0' core_id='14' siblings='14'/>
            <cpu id='15' socket_id='15' die_id='0' core_id='15' siblings='15'/>
          </cpus>
        </cell>
      </cells>
    </topology>
    <secmodel>
      <model>apparmor</model>
      <doi>0</doi>
    </secmodel>
    <secmodel>
      <model>dac</model>
      <doi>0</doi>
      <baselabel type='kvm'>+64055:+108</baselabel>
      <baselabel type='qemu'>+64055:+108</baselabel>
    </secmodel>
  </host>

  <guest>
    <os_type>hvm</os_type>
    <arch name='s390x'>
      <wordsize>64</wordsize>
      <emulator>/usr/bin/qemu-system-s390x</emulator>
      <machine maxCpus='248'>s390-ccw-virtio-jammy</machine>
      <machine canonical='s390-ccw-virtio-jammy' maxCpus='248'>s390-ccw-virtio</machine>
      <machine maxCpus='248'>s390-ccw-virtio-4.0</machine>
      <machine maxCpus='248'>s390-ccw-virtio-5.2</machine>
      <machine maxCpus='248'>s390-ccw-virtio-artful</machine>
      <machine maxCpus='248'>s390-ccw-virtio-3.1</machine>
      <machine maxCpus='248'>s390-ccw-virtio-groovy</machine>
      <machine maxCpus='248'>s390-ccw-virtio-hirsute</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.6</machine>
      <machine maxCpus='248'>s390-ccw-virtio-disco</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.12</machine>
      <machine maxCpus='248'>s390-ccw-virtio-yakkety</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.9</machine>
      <machine maxCpus='248'>s390-ccw-virtio-eoan</machine>
      <machine maxCpus='248'>s390-ccw-virtio-6.0</machine>
      <machine maxCpus='248'>s390-ccw-virtio-5.1</machine>
      <machine maxCpus='248'>s390-ccw-virtio-3.0</machine>
      <machine maxCpus='248'>s390-ccw-virtio-4.2</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.5</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.11</machine>
      <machine maxCpus='248'>s390-ccw-virtio-xenial</machine>
      <machine maxCpus='248'>s390-ccw-virtio-impish</machine>
      <machine maxCpus='248'>s390-ccw-virtio-focal</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.8</machine>
      <machine maxCpus='248'>s390-ccw-virtio-bionic</machine>
      <machine maxCpus='248'>s390-ccw-virtio-5.0</machine>
      <machine maxCpus='248'>s390-ccw-virtio-6.2</machine>
      <machine maxCpus='248'>s390-ccw-virtio-zesty</machine>
      <machine maxCpus='248'>s390-ccw-virtio-4.1</machine>
      <machine maxCpus='248'>s390-ccw-virtio-cosmic</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.4</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.10</machine>
      <machine maxCpus='248'>s390-ccw-virtio-2.7</machine>
      <machine maxCpus='248'>s390-ccw-virtio-6.1</machine>
      <domain type='qemu'/>
      <domain type='kvm'/>
    </arch>
    <features>
      <cpuselection/>
      <deviceboot/>
      <disksnapshot default='on' toggle='no'/>
    </features>
  </guest>

</capabilities>

root@sh-libvirt-s390x:~# virsh capabilities | virsh cpu-baseline --features /dev/stdin
error: XML error: Missing CPU model name
karmab commented 1 month ago

ok, https://github.com/karmab/kcli/commit/9b41c023bc51f60ca532b5fc537d23fc1f3e20a4 should cover it then

stevenhorsman commented 1 month ago

ok, karmab/kcli@9b41c02 should cover it then

Awesome. I'll give it a try once it's through the pipeline!

stevenhorsman commented 1 month ago

@karmab - I'm getting a different, but similar failure now with the latest code:

# kcli version
version: 99.0 commit: 9b41c02 2024/05/09 Available Updates: False
root@sh-libvirt-s390x:~/go/src/github.com/confidential-containers/cloud-api-adaptor/src/cloud-api-adaptor# kcli create kube generic -P image=ubuntu2204 test2
Using 192.168.122.253 as api_ip
Using keepalived virtual_router_id 113
Using version v1.30.0
Deploying Vms...
Hypervisor not compatible with nesting. Skipping
Traceback (most recent call last):
  File "/usr/local/bin/kcli", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 5415, in cli
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1887, in create_generic_kube
    create_kube(args)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cli.py", line 1866, in create_kube
    result = config.create_kube(cluster, kubetype, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2688, in create_kube
    result = self.create_kube_generic(cluster, overrides=overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2713, in create_kube_generic
    return kubeadm.create(self, plandir, cluster, overrides)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/cluster/kubeadm/__init__.py", line 219, in create
    result = config.plan(plan, inputfile=f'{plandir}/bootstrap.yml', overrides=data)
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 2157, in plan
    result = self.create_vm(name, profilename, overrides=currentoverrides, customprofile=profile, k=z,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/config.py", line 957, in create_vm
    result = k.create(name=name, virttype=virttype, plan=plan, profile=profilename, flavor=flavor,
  File "/usr/local/lib/python3.10/dist-packages/kvirt/providers/kvm/__init__.py", line 1345, in create
    conn.defineXML(vmxml)
  File "/usr/local/lib/python3.10/dist-packages/libvirt.py", line 4441, in defineXML
    raise libvirtError('virDomainDefineXML() failed')
libvirt.libvirtError: XML error: No PCI buses available
karmab commented 1 month ago

hum, that one looks like unrelated to the fix, I would say you need to force the machine type can you share the xml of a running vm?

stevenhorsman commented 1 month ago

The latest version with https://github.com/karmab/kcli/commit/f4d5c014f0accffaae6be0e2fba6adb681935db6 is working for me locally now. Thanks as always Karim. @lysliu - you should be unblocked now if you upgrade with pip :)

lysliu commented 1 month ago

I confirmed, the latest pip fixed the issue now.