dmacvicar / terraform-provider-libvirt

Terraform provider to provision infrastructure with Linux's KVM using libvirt
Apache License 2.0
1.6k stars 460 forks source link

can't retrieve partial header #958

Open volker-raschek opened 2 years ago

volker-raschek commented 2 years ago

Hi,

I am trying to deploy some Rocky Linux 8.5 VMs. The first time I run it, I always get the error described below. Only when I run it again does the deployment work.

Volker

System Information

Linux distribution

Arch Linux (Host)
Rocky Linux 8.5 (Target)

Terraform version

Terraform v1.2.3
on linux_amd64
+ provider registry.terraform.io/dmacvicar/libvirt v0.6.14
+ provider registry.terraform.io/hashicorp/template v2.2.0

Provider and libvirt versions

0.6.14

Checklist

Description of Issue/Question

Setup

resource "libvirt_network" "gh_runner" {
  name      = var.network_name
  mode      = var.network_mode
  domain    = var.network_mode == "nat" ? var.network_domain_name : ""
  addresses = var.network_mode == "nat" ? var.network_addresses : []
  bridge    = var.network_mode == "bridge" ? var.network_bridge : ""
  autostart = true

  # (optional) the MTU for the network. If not supplied, the underlying device's
  # default is used (usually 1500)
  # mtu = 9000

  # (Optional) DNS configuration
  dns {
    # (Optional, default false)
    # Set to true, if no other option is specified and you still want to
    # enable dns.
    enabled = true

    # (Optional, default false)
    # true: DNS requests under this domain will only be resolved by the
    # virtual network's own DNS server
    # false: Unresolved requests will be forwarded to the host's
    # upstream DNS server if the virtual network's DNS server does not
    # have an answer.
    local_only = true

    # (Optional) one or more DNS forwarder entries. One or both of
    # "address" and "domain" must be specified. The format is:
    # forwarders {
    #     address = "my address"
    #     domain = "my domain"
    #  }
    #

    # (Optional) one or more DNS host entries. Both of
    # "ip" and "hostname" must be specified. The format is:
    hosts {
      hostname = "router"
      ip = "192.168.175.1"
    }

    # (Optional) one or more DNS SRV entries. Both of
    # "service" and "protocol" must be specified. The format is:
    # srvs {
    #   service = "my-service"
    #   protocol = "my-protocol"
    # }

    # (Optional) one or more static routes.
    # "cidr" and "gateway" must be specified. The format is:
    # routes {
    #     cidr = "10.17.0.0/16"
    #     gateway = "10.18.0.2"
    #   }
  }

  # (Optional) Dnsmasq options configuration
  dnsmasq_options {
    # (Optional) one or more option entries.  Both of
    # "option_name" and "option_value" must be specified.  The format is:
    # options  {
    #     option_name = "server"
    #     option_value = "/base.domain/my.ip.address.1"
    #   }
    # options {
    #     option_name = "address"
    #     ip = "/.api.base.domain/my.ip.address.2"
    #   }
    #
  }
}

resource "libvirt_pool" "ghrunner" {
  name = var.storage_pool_name
  type = "dir"
  path = var.storage_pool_path
}

resource "libvirt_volume" "rocky" {
  name    = "rocky_linux.qcow2"
  source  = "https://download.rockylinux.org/pub/rocky/8/images/Rocky-8-GenericCloud.latest.x86_64.qcow2"
  format  = "qcow2"
  pool    = libvirt_pool.ghrunner.name
}

resource "libvirt_volume" "runner" {
  name            = "ghrunner_${count.index}.qcow2"
  base_volume_id  = libvirt_volume.rocky.id
  count           = var.runner_count
  pool            = libvirt_pool.ghrunner.name
  size            = var.runner_disk_size
  format          = "qcow2"
}

data "template_file" "ci_meta" {
  template = file("${path.module}/cloud_init_-_meta_data.cfg")
}

data "template_file" "ci_network" {
  template = file("${path.module}/cloud_init_-_network_data.cfg")
}

data "template_file" "ci_user" {
  template = file("${path.module}/cloud_init_-_user_data.cfg")
}

resource "libvirt_cloudinit_disk" "cloud_init" {
  name            = "cloud_init.iso"
  network_config  = data.template_file.ci_network.rendered
  meta_data       = data.template_file.ci_meta.rendered
  user_data       = data.template_file.ci_user.rendered
  pool            = libvirt_pool.ghrunner.name
}

# # Create the machine
resource "libvirt_domain" "rocky" {
  name   = format("%s-${count.index}", var.runner_name_prefix)
  memory = var.runner_memory
  vcpu   = var.runner_vcpus

  count = var.runner_count

  cloudinit = libvirt_cloudinit_disk.cloud_init.id

  network_interface {
    network_name    = var.network_name
    hostname        = format("%s-${count.index}", var.runner_name_prefix)
    wait_for_lease  = true
  }

  disk {
    volume_id = libvirt_volume.runner[count.index].id
  }

  graphics {
    type        = "spice"
    listen_type = "address"
    autoport    = true
  }
}

variables

#
# Libvirt: qemu related variables
#
variable "qemu_uri" {
  description = "URI to connect with the qemu-service."
  type        = string
  default     = "qemu:///system"
}

# Libvirt: network related variables
#
variable "network_addresses" {
  description = "List of none or one IPv4 and one or none IPv6 Subnet."
  type        = list(string)
  default     = [ "192.168.175.0/24" ]
}

variable "network_bridge" {
  description = "Name of the network bridge to bind VMs to."
  type        = string
  # FIXME: Remove default value and prompt only for bridge interface when network mode bridge is selected.
  default     = "br0"

  validation {
    condition     = length(var.network_bridge) >= 1
    error_message = "The network_bridge must contains at least one char."
  }
}

variable "network_domain_name" {
  description = "DNS domain name for VM's."
  type        = string
  default     = "ghrunner.local"
}

variable "network_mode" {
  description = "Network mode of the virtual network. Possible values are: \"nat\" (default), \"bridge\"."
  type        = string
  default     = "nat"

  validation {
    condition     = contains(["bridge", "nat"], var.network_mode)
    error_message = "Invalid virtual network mode. Only \"nat\" and \"bridge\" is supported."
  }
}

variable "network_name" {
  description = "Name of the virtual network displayed in libvirt."
  type        = string
  default     = "ghrunner"

  validation {
    condition     = length(var.network_name) >= 1
    error_message = "The network_name must contains at least one char."
  }
}

#
# Libvirt: storage related variables
#
variable "storage_pool_name" {
  description = "Name of the storage pool for the runner VM disks"
  type        = string
  default     = "ghrunner"

  validation {
    condition     = length(var.storage_pool_name) >= 1
    error_message = "The storage_pool_name must contains at least one char."
  }
}

variable "storage_pool_path" {
  description = "Path of the storage pool for the runner VM disks"
  type        = string
  default     = "/var/lib/libvirt/pool/ghrunner"
}

#
# Libvirt: vm related variables
#
variable "runner_name_prefix" {
  description = "Name prefix of the VM's"
  type        = string
  default     = "ghrunner"

  validation {
    condition     = length(var.runner_name_prefix) >= 1
    error_message = "The runner_name_prefix must contains at least one char."
  }
}

variable "runner_count" {
  description = "Amount of runner VMs"
  type        = number
  default     = 5

  validation {
    condition     = var.runner_count >= 1
    error_message = "The runner_count must be greather or equal than 1."
  }
}

variable "runner_disk_size" {
  description = "Size of the virtual hard disk"
  type        = string
  default     = "53687091200"

  validation {
    condition     = var.runner_disk_size >= 53687091200
    error_message = "The runner_disk_size must be greather or equal than 53687091200 bytes."
  }
}

variable "runner_memory" {
  description = "Memory assigned to one VM"
  type        = number
  default     = 4096

  validation {
    condition     = var.runner_memory >= 1
    error_message = "The runner_memory must be greather or equal than 1."
  }
}

variable "runner_vcpus" {
  description = "Amount of vCPU's for one VM"
  type        = number
  default     = 2

  validation {
    condition     = var.runner_vcpus >= 1
    error_message = "The runner_vpu must be greather or equal than 1."
  }
}

Steps to Reproduce Issue

terraform apply
...
libvirt_pool.ghrunner: Creating...
libvirt_network.gh_runner: Creating...
libvirt_pool.ghrunner: Creation complete after 5s [id=f020231e-c8a2-4747-b5f7-78a2c9b8c49b]
libvirt_cloudinit_disk.cloud_init: Creating...
libvirt_volume.rocky: Creating...
libvirt_network.gh_runner: Creation complete after 5s [id=7eacddf8-2ded-4197-9c2e-47adf7ab220c]
libvirt_cloudinit_disk.cloud_init: Creation complete after 1s [id=/var/lib/libvirt/pool/ghrunner/cloud_init.iso;ba7aabbf-3365-408b-90ea-608467394fc7]
╷
│ Error: error while determining image type for https://download.rockylinux.org/pub/rocky/8/images/Rocky-8-GenericCloud.latest.x86_64.qcow2: can't retrieve partial header of resource to determine file type: https://download.rockylinux.org/pub/rocky/8/images/Rocky-8-GenericCloud.latest.x86_64.qcow2 - 200 OK
│ 
│   with libvirt_volume.rocky,
│   on main.tf line 81, in resource "libvirt_volume" "rocky":
│   81: resource "libvirt_volume" "rocky" {
│ 
╵

Additional information:

Do you have SELinux or Apparmor/Firewall enabled? Some special configuration? Have you tried to reproduce the issue without them enabled?

SELinux: no
Apparmor/Firewall: no
pgonin commented 2 years ago

I'm facing the same issue. It seems related to the image If I use the openSUSE Tumbleweed JeOS image, the following works fine

terraform {
  required_providers {
    libvirt = {
      source = "dmacvicar/libvirt"
      version = "0.6.14"
    }
  }
}

# base image
resource "libvirt_volume" "tumbleweed" {
  name   = "tumbleweed"
  pool   = "default"
  source = "http://download.opensuse.org/tumbleweed/appliances/openSUSE-Tumbleweed-JeOS.x86_64-OpenStack-Cloud.qcow2"
  format = "qcow2"
}

If I replace with Leap 15.4 JeOS

terraform {
  required_providers {
    libvirt = {
      source = "dmacvicar/libvirt"
      version = "0.6.14"
    }
  }
}

# base image
resource "libvirt_volume" "tumbleweed" {
  name   = "tumbleweed"
  pool   = "default"
  source = "http://download.opensuse.org/distribution/leap/15.4/appliances/openSUSE-Leap-15.4-JeOS.x86_64-15.4-OpenStack-Cloud-Build6.195.qcow2"
  format = "qcow2"
}

following terraform apply

Error: Error while determining image type for http://download.opensuse.org/distribution/leap/15.4/appliances/openSUSE-Leap-15.4-JeOS.x86_64-15.4-OpenStack-Cloud-Build6.195.qcow2: Can't retrieve partial header of resource to determine file type: http://download.opensuse.org/distribution/leap/15.4/appliances/openSUSE-Leap-15.4-JeOS.x86_64-15.4-OpenStack-Cloud-Build6.195.qcow2 - 404 Not Found
pgonin commented 2 years ago

and I just checked http://download.opensuse.org/distribution/leap/15.4/appliances/openSUSE-Leap-15.4-JeOS.x86_64-15.4-OpenStack-Cloud-Build6.195.qcow2 and I got a 404 Refreshed the page with image links and a new image was actually available There is actually a 'Current' link to avoid issues with images numbers http://download.opensuse.org/distribution/leap/15.4/appliances/openSUSE-Leap-15.4-JeOS.x86_64-15.4-OpenStack-Cloud-Current.qcow2

scabala commented 2 months ago

Hi, I think this is issue with image hosting and this provider can do nothing/very little about unreliable hosting.

hansingt commented 1 month ago

I have the same issue with a Talos unified kernel image:

Error: error while determining image type for https://factory.talos.dev/image/20814cdbf3e618f7ecb37360b08449fcc16712e9ff794e9f3100b563ca808af9/v1.8.1/metal-amd64-secureboot-uki.efi: can't retrieve partial header of resource to determine file type: https://factory.talos.dev/image/20814cdbf3e618f7ecb37360b08449fcc16712e9ff794e9f3100b563ca808af9/v1.8.1/metal-amd64-secureboot-uki.efi - 200 OK

But I don't think, it is related to unrelyable hosting. Instead, it seems like the provider is trying to download only the first 8 bytes from the server, by using the Range HTTP-Header (See https://github.com/dmacvicar/terraform-provider-libvirt/blob/main/libvirt/volume_image.go#L140 for details.).

Which simply might not be supported by every hoster:

A server that doesn't support range requests may ignore the Range header and return the whole resource with a 200 status code.

I'd suggest, to try to download only the first bytes and if this does not work, simply download the whole image. It needs to be downloaded later anyway, so maybe the provider could simply store it meanwhile, to avoid downloading it twice.

scabala commented 1 month ago

Hi @hansingt , thanks for detailed investigation! You seem to be right, I might create a PR with a fix for it.

hansingt commented 1 month ago

Thank you @scabala! I'd create the PR myself, but sadly I'm not that good with Go 🫤

scabala commented 1 month ago

No worries @hansingt, I have went through the code and I think solution is quite simple. Could you check code from PR #1120?