ClusterLabs / anvil

The Anvil! Intelligent Availability™ Platform, mark 3
5 stars 6 forks source link

[storage] anvil-manage-server-storage database gets corrupted in server delete / create #657

Open fabbione opened 5 months ago

fabbione commented 5 months ago

Reproducer is easy, just takes time:

provision a server:

anvil-provision-server --ci-test --name an-test-deploy1 --os centos-stream9 --cpu 4 --ram 4G --storage-group "Storage group 1" --storage-size 30G --install-media CentOS-Stream-9-latest-x86_64-dvd1.iso --driver-disc deploy1.iso

(wait for server to be deployed)

anvil-manage-server-storage -vvv --log-secure --server an-test-deploy1
Working with the server: [an-test-deploy1], UUID: [a89d7fd5-1a1e-4c27-8148-056bd564ddea]

Disk Drives:
- Target: [vda], boot: [01], Replication Volume: [an-test-deploy1/0]
 |- Resource / LV / Metadata sizes: [30.00 GiB / 30.00 GiB / 2.85 MiB], free space: [170.00 GiB]

Optical Drives:
- Target: [sda], boot: [02], ISO: [/mnt/shared/files/CentOS-Stream-9-latest-x86_64-dvd1.iso]
- Target: [sdb], boot: [03], ISO: [/mnt/shared/files/deploy1.iso]

Subnodes:
 |- Name: [an-a01n01], UUID: [94b3bad6-2c4f-499e-b3e8-7cccb06793e5]
   |- Volume: [0], backing device: [/dev/anvil-test-vg/an-test-deploy1_0], DRBD minor: [0], size: [30.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [170.00 GiB]
 |- Name: [an-a01n02], UUID: [14d0bb17-6160-4f59-aa43-5a300653445b]
   |- Volume: [0], backing device: [/dev/anvil-test-vg/an-test-deploy1_0], DRBD minor: [0], size: [30.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [170.00 GiB]

Add a disk:

anvil-manage-server-storage -vvv --log-secure --add 20G --disk vdb --server an-test-deploy1 --storage-group "Storage group 1" --confirm

disk is visible in the VM:

ssh an-test-deploy1 cat /proc/partitions |grep vdb 252 16 20969564 vdb

 anvil-manage-server-storage -vvv --log-secure --server an-test-deploy1
Working with the server: [an-test-deploy1], UUID: [a89d7fd5-1a1e-4c27-8148-056bd564ddea]

Disk Drives:
- Target: [vda], boot: [01], Replication Volume: [an-test-deploy1/0]
 |- Resource / LV / Metadata sizes: [30.00 GiB / 30.00 GiB / 2.85 MiB], free space: [170.00 GiB]
Use of uninitialized value $drbd_volume in hash element at /usr/sbin/anvil-manage-server-storage line 2938.
Use of uninitialized value $lv_name in hash element at /usr/sbin/anvil-manage-server-storage line 2940.
Use of uninitialized value $resource_size in subtraction (-) at /usr/sbin/anvil-manage-server-storage line 2941.
Use of uninitialized value $lv_size in subtraction (-) at /usr/sbin/anvil-manage-server-storage line 2941.
Use of uninitialized value $drbd_volume in hash element at /usr/sbin/anvil-manage-server-storage line 3098.
Use of uninitialized value $drbd_volume in hash element at /usr/sbin/anvil-manage-server-storage line 3099.
Use of uninitialized value $drbd_volume in hash element at /usr/sbin/anvil-manage-server-storage line 3100.
Use of uninitialized value $backing_lv in hash element at /usr/sbin/anvil-manage-server-storage line 3101.
Use of uninitialized value $drbd_volume in hash element at /usr/sbin/anvil-manage-server-storage line 3098.
Use of uninitialized value $drbd_volume in hash element at /usr/sbin/anvil-manage-server-storage line 3099.
Use of uninitialized value $drbd_volume in hash element at /usr/sbin/anvil-manage-server-storage line 3100.
Use of uninitialized value $backing_lv in hash element at /usr/sbin/anvil-manage-server-storage line 3101.
- Target: [vdb], boot: [--], Replication Volume: [an-test-deploy1/]
 |- Resource / LV / Metadata sizes: [0 B / 0 B / 0 B], free space: [0 B]

Optical Drives:
- Target: [sda], boot: [02], ISO: [/mnt/shared/files/CentOS-Stream-9-latest-x86_64-dvd1.iso]
- Target: [sdb], boot: [03], ISO: [/mnt/shared/files/deploy1.iso]

Subnodes:
 |- Name: [an-a01n01], UUID: [94b3bad6-2c4f-499e-b3e8-7cccb06793e5]
   |- Volume: [0], backing device: [/dev/anvil-test-vg/an-test-deploy1_0], DRBD minor: [0], size: [30.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [170.00 GiB]
Use of uninitialized value $this_scan_lvm_lv_path in string eq at /usr/sbin/anvil-manage-server-storage line 3265.
   |- Volume: [1], backing device: [/dev/anvil-test-vg/an-test-deploy1_1], DRBD minor: [1], size: [20.00 GiB]
   ^- In storage group: [<unknown>], size: [0 B], free: [0 B]
Use of uninitialized value $this_scan_lvm_lv_path in string eq at /usr/sbin/anvil-manage-server-storage line 3265.
 |- Name: [an-a01n02], UUID: [14d0bb17-6160-4f59-aa43-5a300653445b]
   |- Volume: [0], backing device: [/dev/anvil-test-vg/an-test-deploy1_0], DRBD minor: [0], size: [30.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [170.00 GiB]
Use of uninitialized value $this_scan_lvm_lv_path in string eq at /usr/sbin/anvil-manage-server-storage line 3265.
   |- Volume: [1], backing device: [/dev/anvil-test-vg/an-test-deploy1_1], DRBD minor: [1], size: [20.00 GiB]
   ^- In storage group: [<unknown>], size: [0 B], free: [0 B]
Use of uninitialized value $this_scan_lvm_lv_path in string eq at /usr/sbin/anvil-manage-server-storage line 3265.

after some minutes, the output goes back to normal:

anvil-manage-server-storage -vvv --log-secure --server an-test-deploy1
Working with the server: [an-test-deploy1], UUID: [a89d7fd5-1a1e-4c27-8148-056bd564ddea]

Disk Drives:
- Target: [vda], boot: [01], Replication Volume: [an-test-deploy1/0]
 |- Resource / LV / Metadata sizes: [30.00 GiB / 30.00 GiB / 2.85 MiB], free space: [150.00 GiB]
- Target: [vdb], boot: [--], Replication Volume: [an-test-deploy1/1]
 |- Resource / LV / Metadata sizes: [20.00 GiB / 20.00 GiB / 1.91 MiB], free space: [150.00 GiB]

Optical Drives:
- Target: [sda], boot: [02], ISO: [/mnt/shared/files/CentOS-Stream-9-latest-x86_64-dvd1.iso]
- Target: [sdb], boot: [03], ISO: [/mnt/shared/files/deploy1.iso]

Subnodes:
 |- Name: [an-a01n01], UUID: [94b3bad6-2c4f-499e-b3e8-7cccb06793e5]
   |- Volume: [0], backing device: [/dev/anvil-test-vg/an-test-deploy1_0], DRBD minor: [0], size: [30.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [150.00 GiB]
   |- Volume: [1], backing device: [/dev/anvil-test-vg/an-test-deploy1_1], DRBD minor: [1], size: [20.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [150.00 GiB]
 |- Name: [an-a01n02], UUID: [14d0bb17-6160-4f59-aa43-5a300653445b]
   |- Volume: [0], backing device: [/dev/anvil-test-vg/an-test-deploy1_0], DRBD minor: [0], size: [30.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [150.00 GiB]
   |- Volume: [1], backing device: [/dev/anvil-test-vg/an-test-deploy1_1], DRBD minor: [1], size: [20.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [150.00 GiB]

delete the server:

anvil-delete-server -vv --log-secure --force --server an-test-deploy1

make sure all server resources have been released: pcs status |grep deploy1 drbdadm status

No currently configured DRBD found.

lvs

no mention of deploy1.

recreate the server:

anvil-provision-server --ci-test --name an-test-deploy1 --os centos-stream9 --cpu 4 --ram 4G --storage-group "Storage group 1" --storage-size 30G --install-media CentOS-Stream-9-latest-x86_64-dvd1.iso --driver-disc deploy1.iso

[root@an-a01n01 ~]# pcs status |grep deploy1
  * an-test-deploy1     (ocf:alteeve:server):    Started an-a01n01
[root@an-a01n01 ~]# anvil-manage-server-storage -vvv --log-secure --server an-test-deploy1
Working with the server: [an-test-deploy1], UUID: [a89d7fd5-1a1e-4c27-8148-056bd564ddea]
The server: [an-test-deploy1] has been deleted.

wait some time and observe the DB corruption. The VM, that is brand new and created with only one disk, still expects an extra disk that does not exist:

 anvil-manage-server-storage -vvv --log-secure --server an-test-deploy1
Working with the server: [an-test-deploy1], UUID: [a89d7fd5-1a1e-4c27-8148-056bd564ddea]
The server: [an-test-deploy1] has been deleted.
[root@an-a01n01 ~]# anvil-manage-server-storage -vvv --log-secure --server an-test-deploy1
Working with the server: [an-test-deploy1], UUID: [84874f9a-8100-4a2f-b1d7-29a5972ecb2f]

Disk Drives:
- Target: [vda], boot: [01], Replication Volume: [an-test-deploy1/0]
 |- Resource / LV / Metadata sizes: [30.00 GiB / 30.00 GiB / 2.85 MiB], free space: [170.00 GiB]

Optical Drives:
- Target: [sda], boot: [02], ISO: [/mnt/shared/files/CentOS-Stream-9-latest-x86_64-dvd1.iso]
- Target: [sdb], boot: [03], ISO: [/mnt/shared/files/deploy1.iso]

Subnodes:
 |- Name: [an-a01n01], UUID: [94b3bad6-2c4f-499e-b3e8-7cccb06793e5]
   |- Volume: [0], backing device: [/dev/anvil-test-vg/an-test-deploy1_0], DRBD minor: [0], size: [30.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [170.00 GiB]
   |- Volume: [1], backing device: [], DRBD minor: [1], size: [20.00 GiB]
   ^- In storage group: [<unknown>], size: [0 B], free: [0 B]
Use of uninitialized value $backing_disk in string eq at /usr/sbin/anvil-manage-server-storage line 3265.
Use of uninitialized value $backing_disk in string eq at /usr/sbin/anvil-manage-server-storage line 3265.
 |- Name: [an-a01n02], UUID: [14d0bb17-6160-4f59-aa43-5a300653445b]
   |- Volume: [0], backing device: [/dev/anvil-test-vg/an-test-deploy1_0], DRBD minor: [0], size: [30.00 GiB]
   ^- In storage group: [Storage group 1], size: [200.00 GiB], free: [170.00 GiB]
   |- Volume: [1], backing device: [], DRBD minor: [1], size: [20.00 GiB]
   ^- In storage group: [<unknown>], size: [0 B], free: [0 B]
Use of uninitialized value $backing_disk in string eq at /usr/sbin/anvil-manage-server-storage line 3265.
Use of uninitialized value $backing_disk in string eq at /usr/sbin/anvil-manage-server-storage line 3265.
digimer commented 2 weeks ago

I tried to reproduce this and the issue did not reappear. I tested on a dev system, not in CI, so it's possible it remains. Can this test be re-enabled to see if the issue remains?

Also, please use -vv. Using -vvv is discouraged unless debugging issues where the source of a problem is unknown, as -vvv generates absolutely massive amounts of logging.

digimer commented 1 week ago

I'm closing this as it should be fixed now. If the issue reappears, reopen this issue.