dmacvicar / terraform-provider-libvirt

Terraform provider to provision infrastructure with Linux's KVM using libvirt
Apache License 2.0
1.54k stars 457 forks source link

Add generic boot disk resource #1065

Closed achetronic closed 5 months ago

achetronic commented 5 months ago

Hello there. When crafting VMs for Talos, and creating a NAT with libvirt_network everything is fine

The problem comes with macvtap. When creating de VMs with macvtap mode, you usually need cloud-init to instruct the OS to get an IP, or to enable dhcp, from inside to work properly (as I do in github.com/achetronic/metal-cloud).

Talos is not working well as Talos starts, everything seems ok, but never connects. net0 device get an IP, inside the propper range but when you see ARP table, this IP is like a ghost. One way to fix it is to mount a machineconfig.yaml to configure some initial stuff before configuring the rest of the things through the API, BUT this is not possible with this provider: it does not include a resource to mount disks that are not for cloud-init

Is there a possibility to implement this? :)

michaelbeaumont commented 5 months ago

One way to fix it is to mount a machineconfig.yaml to configure some initial stuff before configuring the rest of the things through the API, BUT this is not possible with this provider: it does not include a resource to mount disks that are not for cloud-init

What exactly do you mean by "mount a machineconfig.yaml" and "mount disks that are not for cloud-init"?

I personally use this provider to create VMs for Talos. I create an ISO containing the machine config, label it with metal-iso and configure it on the domain as a disk { file = ... }. The talos.config=metal-iso kernel arg allows booting with this config, which I configure using factory.talos.dev ever since its recent release.

I haven't run into any problems with cloud-init support in this provider.

achetronic commented 5 months ago

hello @michaelbeaumont I am trying your way, using the kernel, initrd and some kernel args directly (no config introduced yet), but once the VM is launched it seems to be rebooting constantly.

I'm doing this way because as I said in other issue, there is no way to provide kernel args to the talos.iso image without using directly the vzlinux of talos and set the args to it using the libvirt_domain.kernel and libvirt_domain.cmd fields

I assume when you use the kernel and the initrd, no config is applied for netwirk, or the rest of initial stuff.

Could you provide your YAML (of course redacted or @achetronic on Telegram by DM) to test using it from the metal-iso and give more feedback?

Thank you in advance

achetronic commented 5 months ago

Just in case this is useful for someone, with this Terraform code libvirt is able to start VMs for Talos (networking included):

# Create a dir where all the volumes will be created
resource "libvirt_pool" "volume_pool" {
  name = "vms-volume-pool"
  type = "dir"
  path = "/opt/libvirt/vms-volume-pool"
}

resource "libvirt_volume" "kernel" {
  source = "https://github.com/siderolabs/talos/releases/download/${var.globals.talos.version}/vmlinuz-amd64"
  name   = "kernel-${var.globals.talos.version}"
  pool   = libvirt_pool.volume_pool.name
  format = "raw"
}

resource "libvirt_volume" "initrd" {
  source = "https://github.com/siderolabs/talos/releases/download/${var.globals.talos.version}/initramfs-amd64.xz"
  name   = "initrd-${var.globals.talos.version}"
  pool   = libvirt_pool.volume_pool.name
  format = "raw"
}

# General purpose volumes for all the instances
resource "libvirt_volume" "instance_disk" {
  for_each = var.instances

  name   = join("", [each.key, ".qcow2"])
  pool   = libvirt_pool.volume_pool.name
  format = "qcow2"

  # 10GB (as bytes) as default
  size = try(each.value.disk, 10 * 1000 * 1000 * 1000)

}

resource "libvirt_domain" "instance" {
  for_each = var.instances

  cpu {
    mode = "host-passthrough"
  }

  xml {
    xslt = file("${path.module}/templates/xsl/cdrom-fixes.xsl")
  }

  # Set config related directly to the VM
  name   = each.key
  memory = each.value.memory
  vcpu   = each.value.vcpu

  # Use UEFI capable machine
  machine    = "q35"
  firmware   = "/usr/share/OVMF/OVMF_CODE.fd"

  # You may be wondering why I'm using directly these params instead of released metal ISO image.
  # Well, hard to say, but you can not set kernel params on a crafted image...
  # and I wanted to set some initial things through the machine config YAML on this stage
  initrd = libvirt_volume.initrd.id
  kernel = libvirt_volume.kernel.id

  # Ref: https://www.talos.dev/v1.6/reference/kernel/
  cmdline = [{

    # Args retrieved directly from ISO image
    console                = "ttyS0"       # Serial console for kernel output.
    console                = "tty0"        # Virtual terminal console for kernel output.
    consoleblank           = 0             # Control auto-blanking of the console after inactivity (0 to disable).
    "nvme_core.io_timeout" = 4294967295    # Set maximum I/O timeout for NVMe devices in milliseconds (max value).
    "printk.devkmsg"       = "on"          # Enable real-time logging of device kmsg messages.
    ima_template           = "ima-ng"      # Specify the Integrity Measurement Architecture (IMA) template to use.
    ima_appraise           = "fix"         # Configure IMA file appraisal mode (e.g., "fix" to repair).
    ima_hash               = "sha512"      # Set the hash algorithm used by IMA to verify file integrity.

    # Required (and recommended) args by Talos Team
    "talos.platform" = "metal"             # Platform for running Talos (e.g., "metal" for physical hardware).
    pti              = "on"                # Enable Page Table Isolation (PTI) vulnerability mitigation.
    init_on_alloc    = 1                   # Initialize allocated memory pages (1 to enable, 0 to disable).

    #"talos.config"   = "metal-iso"         # Specify the Talos configuration (e.g., "metal-iso" for ISO installation mode).
    #"talos.hostname" = each.key
    #"talos.experimental.wipe" = "system"
  },{
    _                = "slab_nomerge"      # Unspecified parameter, may be a custom or system-specific setting.
  }]

  # Attach MACVTAP networks
  dynamic "network_interface" {
    for_each = each.value.networks

    iterator = network
    content {
      macvtap   = network.value.interface
      hostname  = each.key
      mac       = network.value.mac
      addresses = network.value.addresses
      wait_for_lease = false
      # Guest virtualized network interface is connected directly to a physical device on the Host,
      # As a result, requested IP address can only be claimed by the OS: Linux is configured in static mode by cloud-init
    }
  }

  disk {
    volume_id = libvirt_volume.instance_disk[each.key].id
    scsi = true
  }

  # IMPORTANT: this is a known bug on cloud images, since they expect a console
  # we need to pass it
  # https://bugs.launchpad.net/cloud-images/+bug/1573095
  console {
    type        = "pty"
    target_port = "0"
    target_type = "serial"
  }

  console {
    type        = "pty"
    target_port = "1"
    target_type = "virtio"
  }

  video {
    type = "qxl"
  }

  graphics {
    # Not using 'spice' to keep using cockpit GUI with ease :)
    type        = "vnc"
    listen_type = "address"
    autoport    = true
  }

  qemu_agent = false
  autostart  = true

  lifecycle {
    ignore_changes = [
      nvram,
      disk[0],
      network_interface[0],
    ]
  }

}

The file named cdrom-fixes.xsl

<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <!-- Fix: Connect a cdrom device on SATA instead of IDE bus -->
    <xsl:template match="/domain/devices/disk[@device='cdrom']/target/@bus">
        <xsl:attribute name="bus">
            <xsl:value-of select="'sata'"/>
        </xsl:attribute>
    </xsl:template>

</xsl:stylesheet>
achetronic commented 5 months ago

In theory, there is a simpler VM definition able to work with Talos, as shown in this example:

https://github.com/siderolabs/contrib/blob/main/examples/terraform/advanced/main.tf#L182-L217