dmacvicar / terraform-provider-libvirt

Terraform provider to provision infrastructure with Linux's KVM using libvirt
Apache License 2.0
1.58k stars 457 forks source link

libvirt_network creation fails for existing bridge #364

Closed vjdhama closed 5 years ago

vjdhama commented 6 years ago

Version Reports:

Distro version of host:

Ubuntu 18.04

Terraform Version Report

Terraform v0.11.8
+ provider.libvirt (unversioned)

Libvirt version

Compiled against library: libvirt 4.0.0
Using library: libvirt 4.0.0
Using API: QEMU 4.0.0
Running hypervisor: QEMU 2.11.1

terraform-provider-libvirt plugin version (git-hash)

0.4.2

Description of Issue/Question

By default virsh allows creation on a libvirt network on existing host bridge. https://libvirt.org/formatnetwork.html#examplesBridge

That works if you create libvirt network with virsh.

Using terraform it throws

* libvirt_network.default: Error crearing libvirt network: virError(Code=38, Domain=0, Message='error creating bridge interface virbr0: File exists')

But the new network is created anyway.

<network>
  <name>default</name>
  <uuid>89d03309-b412-43f5-81bc-82b46bf81ec8</uuid>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:0e:cd:ba'/>
</network>

Also rerunning terraform apply fails with

2018-08-21T15:23:36.825+0700 [DEBUG] plugin.terraform-provider-libvirt: 2018/08/21 15:23:36 [ERR] plugin: stream copy 'stderr' error: session shutdown
* libvirt_network.default: Error defining libvirt network: virError(Code=9, Domain=19, Message='operation failed: network 'default' already exists with uuid 89d03309-b412-43f5-81bc-82b46bf81ec8') -   <network>
      <name>default</name>
      <bridge name="virbr0" stp="on"></bridge>
      <domain></domain>
  </network>

since the original run fails to collect metadata, id of network.

Setup

(Please provide the full main.tf file for reproducing the issue (Be sure to remove sensitive info)

provider "libvirt" {
  uri = "qemu+tcp://root@172.16.255.254/system"
}

resource "libvirt_network" "default" {
  name = "default"
  mode = "bridge"
  bridge = "virbr0"
}

Steps to Reproduce Issue

(Include debug logs if possible and relevant.)

terraform init terraform plan

  bridge: "" => "virbr0"
  mode:   "" => "bridge"
  name:   "" => "default"
2018-08-21T15:07:42.603+0700 [DEBUG] plugin.terraform-provider-libvirt: 2018/08/21 15:07:42 [INFO] Creating libvirt network at qemu+tcp://root@172.16.255.254/system
2018-08-21T15:07:42.603+0700 [DEBUG] plugin.terraform-provider-libvirt: 2018/08/21 15:07:42 [DEBUG] Creating libvirt network at qemu+tcp://root@172.16.255.254/system:   <network>
2018-08-21T15:07:42.603+0700 [DEBUG] plugin.terraform-provider-libvirt:       <name>default</name>
2018-08-21T15:07:42.603+0700 [DEBUG] plugin.terraform-provider-libvirt:       <bridge name="virbr0" stp="on"></bridge>
2018-08-21T15:07:42.603+0700 [DEBUG] plugin.terraform-provider-libvirt:       <domain></domain>
2018-08-21T15:07:42.603+0700 [DEBUG] plugin.terraform-provider-libvirt:   </network>
2018/08/21 15:07:42 [ERROR] root: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* libvirt_network.default: Error crearing libvirt network: virError(Code=38, Domain=0, Message='error creating bridge interface virbr0: File exists')
2018/08/21 15:07:42 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

Additional Infos:

Do you have SELinux or Apparmor/Firewall enabled? Some special configuration? Have you tried to reproduce the issue without them enabled?

vjdhama commented 6 years ago

On further investigation, was able to reproduce the issue using virsh.

For below XML

<network>
  <name>default</name>
  <forward mode="bridge"/>
  <bridge name='virbr0'/>
</network>
virsh net-define net.xml

Network default defined from net.xml
root@svr02:~# virsh net-list --all

 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              inactive   no            yes
virsh net-create test-net.xml

error: Failed to create network from test-net.xml
error: operation failed: network 'default' already exists with uuid fcfcfe24-9884-4bec-9a98-005a0027964f
vjdhama commented 6 years ago

The flow IMO, should be

virsh net-define net.xml and then virsh net-start default, instead on virsh net-define net.xml and virsh net-create net.xml.

Relevent code : https://github.com/dmacvicar/terraform-provider-libvirt/blob/e7b65f1425580d6dfa542c884cdbf1863706ea27/libvirt/resource_libvirt_network.go#L296

MalloZup commented 6 years ago

thx @vjdhama for issue, i will tag it a need_investigation atm i"m on other issues. :+1:

MalloZup commented 6 years ago

@vjdhama i will look at it once i have free cycles. I think i know the problem thx for you issue

vjdhama commented 6 years ago

@MalloZup Thanks for looking into this.

MalloZup commented 6 years ago

@vjdhama just some updates on your issue https://github.com/dmacvicar/terraform-provider-libvirt/issues/389.

Shortly is because for network_resources we don't update the resources.

So if you have your bridge which is already existing you will have always that error.

As workaround you should create/destroy bridges.

For fixing this we need to look at https://godoc.org/github.com/libvirt/libvirt-go#Network.Update

and particulary here we shold implement the update call for the bridge part of networking, and update the resouce correctly.

tommyknows commented 5 years ago

Is there a workaround to have a VM directly connected to a bridge (and thus to the LAN)? Or can someone elaborate this:

As workaround you should create/destroy bridges.

further? Which bridge, the libvirt or my br0 on the host? I want my VMs to have IPs in my home network, to access them directly.

Thanks

MalloZup commented 5 years ago

@tommyknows as workaround you can cut the bridge generation via terraform.

So basically like this example here: https://github.com/dmacvicar/terraform-provider-libvirt/blob/master/examples/ubuntu/ubuntu-example.tf#L41

We don't have any network/bridge creation in terraform, you just attach the domain to existing ones and the creation should not be part of the TF file.

The comment above means:

if you create a br via terraform-libvirt currently you can create it only 1 time. At moment the codebase is so that if you do an apply 2 times, you will have problem as posted in this issue. So the best solution is to don't create network/bridge via terraform-libvirt.

You can still specify the domain external to tf network here: https://github.com/dmacvicar/terraform-provider-libvirt/blob/master/website/docs/r/domain.html.markdown#handling-network-interfaces

hope it helps :+1: :white_flower:

tommyknows commented 5 years ago

Thanks for the reply. I'm just getting started with TF (and KVM kinda too). I want to set up a CoreOS VM on my ubuntu system. Currently, I have the following tf config:

provider "libvirt" {
    uri = "qemu:///system"
}

resource "libvirt_ignition" "k8s-ignition" {
  name = "kubernetes.ign"
  content = "/home/ramon/terraform/definitions/k8s.ign"
}

resource "libvirt_domain" "kubernetes" {
  name   = "kubernetes-terraform"
  memory = "1024"
  vcpu   = 2
  coreos_ignition = "${libvirt_ignition.k8s-ignition.id}"

  network_interface {
    bridge = "br0"
    addresses = ["192.168.1.187"]
  }

  boot_device {
    dev = [ "hd", "network"]
  }

  console {
    type        = "pty"
    target_port = "0"
    target_type = "serial"
  }

  console {
    type        = "pty"
    target_type = "virtio"
    target_port = "1"
  }

  disk {
    volume_id = "${libvirt_volume.coreos.id}"
  }

  graphics {
    type        = "spice"
    listen_type = "address"
    autoport    = true
  }

}

resource "libvirt_volume" "coreos" {
  name   = "coreos"
  pool   = "default"
  source = "/home/ramon/terraform/images/coreos_production_qemu_image.img"
  format = "qcow2"
}

However, I cannot access my host on .187. When using a NATed network, it works fine.

Now, if I cannot set an IP Address for the host (because qemu-guest-agent is not installed on coreOS?), how can I find the IP Address? ip a doesn't show a new IP Address either.

Setting the wait_for_lease just waits forever.

Thanks for your help.

MalloZup commented 5 years ago

yop, in bridge mode you need the qemu-guest-agent installed on the Domain. Afaik this is the only solution and yop wait_for_lease wait forever because we cannot get IP without the qemu-agent

tommyknows commented 5 years ago

So there's no way to have CoreOS hosts in bridged mode then? :/

MalloZup commented 5 years ago

@tommyknows you could use the cloud_init for installing the qemu-guest-agentpkg for the CoreOS and have it in bridge mode https://github.com/dmacvicar/terraform-provider-libvirt/tree/master/examples/ubuntu

tommyknows commented 5 years ago

but there's no networking, right? It's not that I just can't inspect it with KVM, the guest does not have any kind of connection (?). -> Means I'd need to copy a file onto the host by sharing a volume.

(And "installing" in CoreOS would mean running a docker container, although I'm going to try this.

MalloZup commented 5 years ago

agh yes i think then that coreos doesn\t have cloudinit in that case :smile: but i think you could be in a good directoin in that way :)

remoe commented 5 years ago

@tommyknows , i have the same issue: I want to install qemu-guest-agent on CoreOS. Do you have found a solution?

tommyknows commented 5 years ago

I started working with RancherOS, as they provide a qemu-guest-agent docker image. I guess you should be able to get this working on CoreOS too. Basically, what I did:

1) Download the docker image on your host:

docker pull docker.io/rancher/os-qemuguestagent:v1.4.0-rc1

2) Export the docker image to a .tar file:

docker export rancher/os-qemuguestagent:v1.4.0-rc1  -o qemu-guest-agent.tar

3) Add the folder as a mount to your Terraform VMs (inside the libvirt_domain resource):

filesystem {
  source = "/media/terraform/images"
  target = "qemu_docker_image"
  readonly = true
}

And mount the folder on the host via cloudinit, ignition or just run it as a command. This here is displayed as a mount resource in cloudinit:

- - qemu_docker_image
  - /media/images
  - 9p
  - trans=virtio,version=9p2000.L,rw

4) Import the docker image on the host:

docker load -i /media/images/qemu-guest-agent.tar

5) Now that you've got the image on the host, you need to start a container with it. I am using the RancherOS Service template, so I did not have to figure out the options for the container. The RancherOS Service template looks like this:

qemu-guest-agent:
  image: rancher/os-qemuguestagent:v1.4.0-rc1
  command: ["/usr/bin/qemu-ga"]
  privileged: true
  restart: always
  labels:
    io.rancher.os.scope: system
  pid: host
  ipc: host
  net: host
  uts: host
  volumes_from:
  - command-volumes
  volumes:
  - /dev:/host/dev

If needed, I could provide the output of a docker inspect on the running container.

remoe commented 5 years ago

Awesome hint! Thanks for this. Really interesting workflow!

remoe commented 5 years ago

Hey ... here is a part of my CoreOS setup like what @tommyknows was done:

data "ignition_systemd_unit" "dockerimages-mount" {
  name = "images.mount"
  content = "${file("${path.module}/../ignition/mount-images.cli")}"
}

data "ignition_systemd_unit" "qemuagent" {
  name = "qemuagent.service"
  content = "${file("${path.module}/../ignition/qemuagent.cli")}"
}
data "ignition_config" "master" {
  systemd = [
      "${data.ignition_systemd_unit.dockerimages-mount.id}",
      "${data.ignition_systemd_unit.qemuagent.id}",
  ]
}

mount-images.cli

[Unit]
Before=local-fs.target
[Mount]
What=qemu_docker_image
Where=/images
Options=ro,trans=virtio,version=9p2000.L
Type=9p
[Install]
WantedBy=local-fs.target

The mount works, but not the run of the qemu-agent container. Try to solve it ...

UPDATE: I tried with this:

[Unit]
Description=QEMU Agent
After=docker.service 
[Service]
ExecStartPre=/usr/bin/docker load -i /images/qemu-guest-agent.tar
ExecStart=/usr/bin/docker run \
  --privileged=true \
  --cap-add=ALL \
  --net=host \
  -e container=1 \
  -e HOST=/host \
  -e TERM=xterm \
  -v /dev/virtio-ports:/dev/virtio-ports \
  -v /etc/os-release:/etc/os-release:ro \
  -v /dev:/dev \
  -v /proc:/hostproc \
  -v /run/systemd:/run/systemd \
  -v /var/log/qemu-ga:/var/log/qemu-ga:rw \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  rancher/os-qemuguestagent:v1.4.0-rc1
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target

But this doesn't run correct.

  1. UPDATE: because of:

/usr/bin/ros: no such file or directory": unknown. . So, this container runs only on RangerOS ?! :)

tommyknows commented 5 years ago

Mh I'm not sure - i think it could just be the entrypoint of the container. What happens if you change the entrypoint (the command that the container is started with) to /usr/bin/qemu-ga? Or am I missing something?

MalloZup commented 5 years ago

This issue got mixed. @tommyknows feel free to open an issue, I will fix the bridge net problem reported here. 1st post

MalloZup commented 5 years ago

this was fixed by https://github.com/dmacvicar/terraform-provider-libvirt/pull/531 which will merged on this week on master.

Thx for all infos and comments. :sun_behind_large_cloud: