Dynamic dispatch of resource such as NICs can result in destruction of NICs when MaaSMAAS re-commissions a machine drifting state such as to have the resource fall out of scope:
but if the firmware update just completed or a k-commandline option changes that NIC name, it will cause the resource to be destroyed. This creates a death spiral because the destruction happens not just in tf state, but in MaaSMAAS itself - the nic is deleted from the machine requiring a re-commission to regain awareness of it. This seems like terminally incorrect behavior as TF cannot delete disks or NICs from a physical server or VM given to MaaS at this level of interface (in our case, since we're using a TF openstack provider it could trim them form the underlying instance, but the maas provider cannot). Other resources then depending on this resource get deleted and the provider loses state sync causing this disk dispatch block with the network one above:
data "maas_block_device" "metal-1" {
for_each = { for entry in local.disk_map_1: "${entry.machine}.${entry.disk}" => entry }
name = each.value.disk
machine = each.value.machine
}
resource "maas_block_device" "metal-1-os" {
for_each = {
for k,v in data.maas_block_device.metal-1: k => v if v.size_gigabytes != 128 && contains(flatten([
for p in v.partitions: [
p.mount_point
]
]),"/")
}
machine = each.value.machine
name = each.value.name
id_path = each.value.id_path
size_gigabytes = each.value.size_gigabytes
is_boot_device = true
tags = [
"os_drive",
]
partitions {
size_gigabytes = 32
fs_type = "ext4"
label = "os"
mount_point = "/"
}
partitions {
size_gigabytes = 1
fs_type = "fat32"
mount_point = "/boot"
bootable = true
}
}
to correctly identify OS drives after a commissioning pass but lose track of other disks and crash the action:
OpenTofu used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
OpenTofu planned the following actions, but then encountered a problem:
# maas_block_device.metal-1-os["test-metal1-00.sdg"] will be created
+ resource "maas_block_device" "metal-1-os" {
+ block_size = 512
+ id = (known after apply)
+ id_path = "/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_f150670d-9df4-440e-855f-07eb00906be1"
+ is_boot_device = true
+ machine = "fwtq6w"
+ model = (known after apply)
+ name = "sdg"
+ path = (known after apply)
+ serial = (known after apply)
+ size_gigabytes = 64
+ tags = [
+ "os_drive",
]
+ uuid = (known after apply)
+ partitions {
+ fs_type = "ext4"
+ label = "os"
+ mount_point = "/"
+ path = (known after apply)
+ size_gigabytes = 32
}
+ partitions {
+ bootable = true
+ fs_type = "fat32"
+ mount_point = "/boot"
+ path = (known after apply)
+ size_gigabytes = 1
}
}
# maas_block_device.metal-1-os["test-metal1-01.sdg"] will be updated in-place
~ resource "maas_block_device" "metal-1-os" {
id = "20"
name = "sdg"
~ tags = [
- "1rpm",
- "rotary",
+ "os_drive",
]
# (8 unchanged attributes hidden)
~ partitions {
~ label = "root" -> "os"
~ size_gigabytes = 63 -> 32
tags = []
# (4 unchanged attributes hidden)
}
+ partitions {
+ bootable = true
+ fs_type = "fat32"
+ mount_point = "/boot"
+ size_gigabytes = 1
}
}
# maas_block_device.metal-1-os["test-metal1-02.sdg"] will be created
+ resource "maas_block_device" "metal-1-os" {
+ block_size = 512
+ id = (known after apply)
+ id_path = "/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_4a7e675b-d4e8-4f9e-9a4c-7c1b8902c3c8"
+ is_boot_device = true
+ machine = "kherdk"
+ model = (known after apply)
+ name = "sdg"
+ path = (known after apply)
+ serial = (known after apply)
+ size_gigabytes = 64
+ tags = [
+ "os_drive",
]
+ uuid = (known after apply)
+ partitions {
+ fs_type = "ext4"
+ label = "os"
+ mount_point = "/"
+ path = (known after apply)
+ size_gigabytes = 32
}
+ partitions {
+ bootable = true
+ fs_type = "fat32"
+ mount_point = "/boot"
+ path = (known after apply)
+ size_gigabytes = 1
}
}
# maas_block_device.metal-1-os["test-metal1-03.sdg"] will be created
+ resource "maas_block_device" "metal-1-os" {
+ block_size = 512
+ id = (known after apply)
+ id_path = "/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_d144595b-1881-416f-8797-63f102c731e4"
+ is_boot_device = true
+ machine = "yxsy7e"
+ model = (known after apply)
+ name = "sdg"
+ path = (known after apply)
+ serial = (known after apply)
+ size_gigabytes = 64
+ tags = [
+ "os_drive",
]
+ uuid = (known after apply)
+ partitions {
+ fs_type = "ext4"
+ label = "os"
+ mount_point = "/"
+ path = (known after apply)
+ size_gigabytes = 32
}
+ partitions {
+ bootable = true
+ fs_type = "fat32"
+ mount_point = "/boot"
+ path = (known after apply)
+ size_gigabytes = 1
}
}
Plan: 3 to add, 1 to change, 0 to destroy.
....
│ Error: ServerError: 404 Not Found (No BlockDevice matches the given query.)
│
│ with maas_block_device.metal-1-data["test-metal1-02.sde"],
│ on maas-blockdev.tf line 54, in resource "maas_block_device" "metal-1-data":
│ 54: resource "maas_block_device" "metal-1-data" {
│
╵
╷
│ Error: cannot find link (32) on the network interface (12) from machine (kherdk)
│
│ with maas_network_interface_link.metal-1["test-metal1-02"],
│ on maas-machines.tf line 16, in resource "maas_network_interface_link" "metal-1":
│ 16: resource "maas_network_interface_link" "metal-1" {
│
╵
...
bottom line being that physical components should not be capable of being destroyed - disks, NICs, BMCs. Logical partitions, VLANs, etc ... of course; but this logic bug messes up reference tracking and brings down the house of cards pretty hard requiring state rm and other bad behaviors to remedy.
Dynamic dispatch of resource such as NICs can result in destruction of NICs when
MaaSMAAS re-commissions a machine drifting state such as to have the resource fall out of scope:but if the firmware update just completed or a k-commandline option changes that NIC name, it will cause the resource to be destroyed. This creates a death spiral because the destruction happens not just in tf state, but in
MaaSMAAS itself - the nic is deleted from the machine requiring a re-commission to regain awareness of it. This seems like terminally incorrect behavior as TF cannot delete disks or NICs from a physical server or VM given to MaaS at this level of interface (in our case, since we're using a TF openstack provider it could trim them form the underlying instance, but the maas provider cannot). Other resources then depending on this resource get deleted and the provider loses state sync causing this disk dispatch block with the network one above:to correctly identify OS drives after a commissioning pass but lose track of other disks and crash the action:
bottom line being that physical components should not be capable of being destroyed - disks, NICs, BMCs. Logical partitions, VLANs, etc ... of course; but this logic bug messes up reference tracking and brings down the house of cards pretty hard requiring
state rm
and other bad behaviors to remedy.