Telmate / terraform-provider-proxmox

Terraform provider plugin for proxmox
MIT License
2.11k stars 511 forks source link

VM is not created properly or completely, no Cloud-init drive after creation #1104

Open yyy916 opened 4 days ago

yyy916 commented 4 days ago

Hi! I have no prior knowledge related to any of this, so I may have made many mistakes. Please help me out with this. I'm using an Ubuntu 24.04 VM template to create VMs using Terraform, followed this tutorial to create the VM template https://tcude.net/creating-a-vm-template-in-proxmox/. My main.tf is as follows -

terraform {
required_providers {
proxmox = {
source = "Telmate/proxmox"
version = "3.0.1-rc3"
}
}
}
provider "proxmox" {
pm_api_url = "https://xxxxx:8006/api2/json"
pm_user = "xyz@pve"
pm_password = "xyz" # Or use an API token
pm_tls_insecure = true
pm_debug = true
pm_log_enable = true
}

resource "proxmox_vm_qemu" "my_test_vms" {
count = 1
name = "Test-vm-${count.index + 1}"

onboot = true
vm_state = "running"

target_node = "xxx"
memory = 2048
cores = 2
sockets = 1
clone = "template-name"
network {
model = "virtio"
bridge = "vmbr0"
}

ipconfig0 = "ip=dhcp"

}

This gives me a VM with no cloud-init drive and a state of continuous boots like the pic attached. The Ubuntu OS doesn't startup. image image

Now , if I remove bridge="vmbr0" and ipconfig0="ip=dhcp". This creates a VM with the following error. It starts once manually started though.

proxmox_vm_qemu.my_test_vms[0]: Creating... proxmox_vm_qemu.my_test_vms[0]: Still creating... [10s elapsed] ╷ │ Error: error updating VM: 500 no sdn vnet ID specified, error status: {"data":null} (params: map[agent:0 bios:seabios cores:2 cpu:host delete:ide2,scsi0 hotplug:network,disk,usb kvm:true memory:2048 name:Test-vm-1 net0:virtio=5E:74:ED:5A:07:CC numa:false onboot:true protection:false scsihw:lsi sockets:1 tablet:true vmid:102]) │ │ with proxmox_vm_qemu.my_test_vms[0], │ on main.tf line 20, in resource "proxmox_vm_qemu" "my_test_vms": │ 20: resource "proxmox_vm_qemu" "my_test_vms" { │

If I remove just the line "bridge=vmbr0", it gives a similar error and starts when started. Error: error updating VM: 500 no sdn vnet ID specified, error status: {"data":null} (params: map[agent:0 balloon:0 bios:seabios cicustom: cipassword: ciupgrade:0 cores:2 cpu:host delete:ide0,ide2,scsi0,ciuser,searchdomain,nameserver,shares,serial0 hotplug:network,disk,usb ipconfig0:ip=dhcp kvm:true memory:4096 name:induz-memcache-vm-1 net0:virtio=CE:F6:F8:EA:5F:B9 numa:0 onboot:true protection:false scsihw:lsi sockets:1 sshkeys:%0A tablet:true vmid:100])

yyy916 commented 4 days ago

Any little feedback is much needed and appreciated!!!

yyy916 commented 4 days ago

I have also experienced Plugin crashes and Can't lock file...- got timeout errors similar to #1101

yyy916 commented 4 days ago

With the newest version = "3.0.1-rc4", I get a created VM with no errors but it doesn't start. It keeps getting connected and disconnected as a loop.

regexhater commented 3 days ago

I have the same problem. What's more, I've spotted that when running two vms, that only differ in template to clone from, one boots and one has the same problem as described in this thread.

Tinyblargon commented 3 days ago

@yyy916

yyy916 commented 3 days ago

@Tinyblargon

  1. I used the Proxmox-VE_8.2-1 iso image. The pveversion shows proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve) pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
  2. I followed everything as in the link https://tcude.net/creating-a-vm-template-in-proxmox/ . I attached an extra hard drive for storage.
  3. I don't think there are any unattached or orphaned disks from deleting a VM. I usually face the issue of a failed apply when there are, and go on to delete them. I'm assuming you're asking about cloud-init-disks. I delete them using rbd rm command. Please help me understand if I misunderstood something!
  4. Again, I followed as in the above link. image
yyy916 commented 3 days ago

@Tinyblargon Please let me know if I haven't given you enough details! Do you need me to attach any logs or other info?

Tinyblargon commented 3 days ago

@yyy916 in the article you linked scsi0 seems to be the boot disk, could you change the boot order to only include scsi0 and see if that fixes it.

yyy916 commented 3 days ago

I will try and let you know!

yyy916 commented 3 days ago

Hey...sorry for the delay. There was plugin init error. So i edited out the boot order in the template and regenerated it. It gives me the same output. The VM is created without a cloud-init drive and is in a state of continuous connection issues. image

I enabled only scsi0. After regenerating the image, the order changed to scsi0,ide0.

yyy916 commented 3 days ago

My created VM has this boot order image

yyy916 commented 3 days ago

@Tinyblargon Could I be missing out on any other details? I even tried mentioning bootdisk = "scsi0" in my main.tf

Tinyblargon commented 3 days ago

@yyy916 one of the things that i think is going wrong is that the guide you followed works under the assumption template settings will be preserved.

Terraform will create the new vm with no regard for the template. Therefore, the only thing needed in the template is a boot disk.

yyy916 commented 3 days ago

@Tinyblargon So how do you recommend I create the template? I need an Ubuntu 24.04 template. Could you please help me out with a step-by-step guide? I tried out whatever I found, for some reason it ends up the same.

Uncurlhalo commented 1 day ago

So i've been battling this same issue today. I ultimately managed to find a solution that seems to work. I'll provide some links and screenshots as well as my resource block for creating the vm's that fixed my issue. There is clearly a documentation gap here but i see you have an issue to address it for a planned release.

To start with I prepared a template per the instructions on the proxmox wiki here. I called my resulting template ubuntu-cloud-init-template. These images show my "Options" and "Hardware" for the template template-hardware template-options

Once I had my template VM I created a resource to make my VM. I used the existing docs on creating a cloud-init file in snippets to create user-data.yml with the contents I needed for my use case but I think this should be easy enough to understand and implement on your own. The core of the issue here I think is that cloud-init expects a serial console. My template had them but the resource seems to just clone the disk image so you need to make sure you create one as part of your resource. I could be totally off the mark with that, but it got me out of the looping startup hell.

resource "proxmox_vm_qemu" "k8s-control-plane" {
  # count of number of control nodes
  count = var.control_node_spec.count

  # Start of actual resrouces for vm
  name        = "k8s-control-${count.index}"
  target_node = var.node_name

  vmid    = format("${var.control_node_spec.vm_id_prefix}%02d", count.index)
  desc    = format("Kubernetes Control Plane %02d", count.index)
  tags    = "k8s,control-plane"
  os_type = "cloud-init"

  # clone my existing template
  clone = "ubuntu-cloud-init-template"

  # start at boot
  onboot = true

  # expect qemu-agent to be enabled and tell it we are booting linux guests
  agent   = 1
  qemu_os = "l26"
  scsihw  = "virtio-scsi-pci"

  # define resources
  cpu     = "host"
  cores   = var.control_node_spec.cores
  sockets = 1

  # define memory
  memory = var.control_node_spec.memory

  # specify our custom userdata script
  cicustom = "user=local:snippets/k8s-user-data.yml"

  # create my disks
  disks {
    ide {
      ide2 {
        cloudinit {
          storage = "local"
        }
      }
    }
    scsi {
      scsi0 {
        disk {
          size    = "20G"
          storage = "local-lvm"
          format  = "raw"
          cache   = "none"
          backup  = false
        }
      }
    }
  }

  # This is mandatory for some reason
  serial {
    id   = 0
    type = "socket"
  }

  # define network interfaces
  network {
    model  = "virtio"
    bridge = "vmbr0"
  }

  # set cloud init networking info, look at providing with cicustom
  ipconfig0 = format("ip=192.168.1.2%02d/24,gw=192.168.1.1", count.index)
}
Tinyblargon commented 1 day ago

@Uncurlhalo thanks for the help with this issue, the information you provided will go a long way when i start working on #1105

Uncurlhalo commented 1 day ago

I also tested this with ipconfig0 = "ip=dhcp" as well, as i know someone mentioned something with that before, it seemed to work fine as the vm's started and I saw they got leases from my router.

Stankye commented 1 day ago

I recently updated my automation, some quick notes on that might help when you are looking over stuff (UEFI was a pain):

This is my disk import line. I do not import efidisk in TF, so that might need to be documented. qm create 9107 --name "debian-12-cloud-init-dhcp-cis" --bios ovmf --machine q35 --efidisk0 local-lvm:0,efitype=4m,pre-enrolled-keys=1 --cpu cputype=host --cores 2 --memory 4096 --net0 virtio,bridge=vmbr0,mtu=1

qm importdisk 9107 debian-12-genericcloud-amd64-20240901-1857.qcow2 local-lvm

--efidisk0 local-lvm:0,efitype=4m,pre-enrolled-keys=1

pre-enrolled-keys does not seem to be configurable. https://github.com/Telmate/terraform-provider-proxmox/blob/master/proxmox/resource_vm_qemu.go

This will prevent images that do not have secureboot enabled from booting (like Alpine)

resource "proxmox_vm_qemu" "pihole" {
    name = "pihole"
    desc = "pihole_alloy_debian12_dhcp_cis"
    tags = ["pihole", "alloy", "debian12", "cis", "docker"]

    target_node = "pve1"

    # Start on boot
    # onboot = true

    # The destination resource pool for the new VM
    # pool = "pool0"

    # The template name to clone this vm from
    clone = "debian-12-cloud-init-dhcp-cis"

    # omvf is required for UEFI
    bios = "ovmf"

    # Link, Fast clone
    full_clone = "false"

    # Activate QEMU agent for this VM
    agent = 1

    # Set to cloud-init
    os_type = "cloud-init"
    cores = 2
    sockets = 1
    cpu = "host"
    memory = 4096
    scsihw = "virtio-scsi-single" # Benchmarks faster then iscsi https://kb.blockbridge.com/technote/proxmox-aio-vs-iouring/#recommended-settings

    # Setup the disks
    # SCSI1 is the cloudinit disk. UEFI does not work with IDE.
    # modify discard if you are running on ssd's
    disks {
        scsi {
            scsi1 {
                cloudinit {
                    storage = "local-lvm"
                }
            }
            scsi0 {
                disk {
                    size            = 16
                    cache           = "writeback"
                    storage         = "local-lvm"
                    iothread        = true
                    asyncio         = "io_uring"
                    discard         = false
                }
            }
        }
    }

    network {
        model = "virtio"
        bridge = "vmbr0"
        tag = 0
        mtu = 1
    }

    # scsi0 is the boot/OS disk.
    boot = "order=scsi0"
    # this sets the ip address to dhcp. Note: If you are configuring from a template, make sure that dhcp will get seeded properly. This might require "echo -n > /etc/machine-id"
    ipconfig0 = "ip=dhcp"

    sshkeys = trimspace(data.local_file.ssh_public_key.content)
    ci_wait = 30
    ciuser = "debian"
    cipassword = "password"
  }

edit: https://github.com/Stankye/Proxmox-CloudInit-Template/blob/main/debian.md

ElForastero commented 2 hours ago

Disclaimer: I'm a noobie in terraform.

I've created and configured ubuntu cloud-init image converting it into a vm template using an official pve guide, configuring boot order, serial console, etc.

image