dmacvicar / terraform-provider-libvirt

Terraform provider to provision infrastructure with Linux's KVM using libvirt
Apache License 2.0
1.54k stars 457 forks source link

Fix `networkRange` race condition and global state corruption #945

Closed omertuc closed 1 year ago

omertuc commented 2 years ago

While the networkRange function calculates lastIP, it uses net.IPv4zero or net.IPv6zero as a starting value from which it builds the final last IP. The issue is that these are global variables defined in the standard library, and modifying them affects all future users of those variables across the program.

This leads to a not-so-rare race condition when creating multiple libvirt networks. These networks are being created concurrently and they both race modifying that global variable. Their lastIP DHCP configuration gets mixed up as result. The manner in which the configuration gets mixed up is unpredictable, but it could lead to a corrupt configuration that cannot be applied.

The solution is to copy the zero IP addresses to a new variable rather than use them directly. Instead for simplicity I just instantiate an empty slice with the same length as the net IP, which would be filled with zeroes anyway.

This commit also solves another unrelated bug in getNetworkIPConfig where the ^ operator was used as if it's a power operator, while in reality it's a bitwise-xor that leads to slightly incorrect results. This makes the validation only slightly wrong and is overly strict (e.g. it will not allow you to create /28 IPv4 network even though such network has far more available addresses than the condition blocks)

A similar race condition can be simply reproduced and visualized with this short go code:

package main

import (
    "fmt"
    "net"
    "sync"
)

func getNetMaskWithMax16Bits(m net.IPMask) net.IPMask {
    ones, bits := m.Size()
    if bits-ones > 16 {
        if bits == 128 {
            return net.CIDRMask(128-16, 128)
        }
        return net.CIDRMask(32-16, 32)
    }
    return m
}

func networkRange(network *net.IPNet) (net.IP, net.IP) {
    netIP := network.IP.To4()
    lastIP := net.IPv4zero.To4()
    if netIP == nil {
        netIP = network.IP.To16()
        lastIP = net.IPv6zero.To16()
    }
    firstIP := netIP.Mask(network.Mask)
    intMask := getNetMaskWithMax16Bits(network.Mask)
    for i := 0; i < len(lastIP); i++ {
        lastIP[i] = netIP[i] | ^intMask[i]
    }
    return firstIP, lastIP
}

func update(wg *sync.WaitGroup, cidr string, id int) {
    address := cidr
    _, ipNet, _ := net.ParseCIDR(address)
    start, end := networkRange(ipNet)
    start[len(start)-1]++
    start[len(start)-1]++
    end[len(end)-1]--
    fmt.Printf("Start %d: %s\n", id, start.String())
    fmt.Printf("End %d: %s\n", id, end.String())
    wg.Done()
}

func main() {
    var wg sync.WaitGroup
    wg.Add(2)
    go update(&wg, "192.168.145.0/24", 0)
    go update(&wg, "192.168.127.0/24", 1)
    wg.Wait()
}

Then run:

watch -n0.1 -d go run main.go

To see it happen for short moments.

Alternatively, create this main.tf file:

terraform {
  required_providers {
    libvirt = {
      source = "dmacvicar/libvirt"
      version = "0.6.14"
    }
  }
}

provider "libvirt" {
  uri = "qemu:///system"
}

resource "libvirt_network" "net" {
  name = "omer-net"
  mode = "nat"
  addresses = ["192.168.127.0/24"]
  autostart = true
}

resource "libvirt_network" "secondary_net" {
  name = "omer-second-net"
  mode = "nat"
  addresses = ["192.168.145.0/24"]
  autostart = true
}

And run:

while true; do
    if ! (terraform apply -auto-approve && terraform destroy -auto-approve); then
        break
    fi
done

Until the bug reproduces with the following error:

│ Error: error defining libvirt network: internal error: range
192.168.127.2 - 192.168.145.254 is not entirely within network
    192.168.127.1/24 -   <network>
    │       <name>omer-net</name>
    │       <forward mode="nat"></forward>
    │       <bridge stp="on"></bridge>
    │       <dns enable="no"></dns>
    │       <ip address="192.168.127.1" family="ipv4" prefix="24">
    │           <dhcp>
    │               <range start="192.168.127.2"
    end="192.168.145.254"></range>
    │           </dhcp>
    │       </ip>
    │       <options
    xmlns="http://libvirt.org/schemas/network/dnsmasq/1.0"></options>
    │   </network>
    │
    │   with libvirt_network.net,
    │   on main.tf line 14, in resource "libvirt_network" "net":
    │   14: resource "libvirt_network" "net" {

Please make sure you read the contributor documentation before opening a Pull Request.

ericroy commented 2 years ago

Maintainers, can we merge this? This bug has been biting me semi-regularly for months. Thanks for tracking it down @omertuc !

osherdp commented 2 years ago

@dmacvicar

dmacvicar commented 1 year ago

Thanks for the patch. I will do a release once we get #950 in.