Open stereobutter opened 2 years ago
I had some success with the udev rules from WALinuxAgent
storage:
files:
- path: /etc/udev/rules.d/66-azure-storage.rules
mode: 0750
contents:
source: https://raw.githubusercontent.com/Azure/WALinuxAgent/master/config/66-azure-storage.rules
- path: /etc/udev/rules.d/99-azure-product-uuid.rules
mode: 0750
contents:
source: https://raw.githubusercontent.com/Azure/WALinuxAgent/master/config/99-azure-product-uuid.rules
These create symlinks in /dev/disk/by-path/
of the following form (where N is the LUN of the disk):
Using these I was able move /var
to a data disk using
disks:
- device: /dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0
wipe_table: false
partitions:
- size_mib: 0
start_mib: 0
label: user_data
filesystems:
- path: /var
device: /dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0
format: xfs
with_mount_unit: true
However creating a new VM with the same butane file and a copy of the data disk attached fails to boot with
[ 10.200850] ignition[1190]: INFO : mount: op(2): [started] mounting "/dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0" at "/sysroot/var" with type "xfs" and options ""
[ 10.214311] ignition[1190]: DEBUG : mount: op(2): executing: "mount" "-o" "" "-t" "xfs" "/dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0" "/sysroot/var"
[ 10.225680] XFS (sdc): Metadata CRC error detected at xfs_sb_read_verify+0x14d/0x170 [xfs], xfs_sb block 0x0
[ 10.232931] XFS (sdc): Unmount and run xfs_repair
[ 10.236625] XFS (sdc): First 128 bytes of corrupted metadata buffer:
[ 10.241216] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 40 00 00 XFSB.........@..
[ 10.246479] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 10.250978] 00000020: 5d 8c 5f 81 54 ef 41 53 ac 76 1a 0e 17 ba c9 76 ]._.T.AS.v.....v
[ 10.256190] 00000030: 00 00 00 00 00 20 00 09 00 00 00 00 00 00 00 80 ..... ..........
[ 10.262230] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................
[ 10.268172] 00000050: 00 00 00 01 00 10 00 00 00 00 00 04 00 00 00 00 ................
[ 10.279351] 00000060: 00 00 0a 00 bc b5 10 00 02 00 00 08 00 00 00 00 ................
[ 10.284456] 00000070: 00 00 00 00 00 00 00 00 0c 0c 09 03 14 00 00 19 ................
[ 10.290262] XFS (sdc): SB validate failed with error -74.
[ 10.295419] ignition[1190]: CRITICAL : mount: op(2): [failed] mounting "/dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0" at "/sysroot/var" with type "xfs" and options "": exit status 32: Cmd: "mount" "-o" "" "-t" "xfs" "/dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0" "/sysroot/var" Stdout: "" Stderr: "mount: /sysroot/var: mount(2) system call failed: Structure needs cleaning.\n"
[[0;1;31mFAILED[0m] Failed to start [0;1;39mIgnition (mount)[0m.
See 'systemctl status ignition-mount.service' for details.
full log: frodo.17de004c-a65a-43fa-94f2-e45882ac5195.serialconsole.txt
Disclaimer: Please excuse if the filesystem issue above isn't related to azure disk and should be its own issue.
Using ext4
instead of xfs
in the butane definition for the filesystem, a new VM that gets a data disk from a preexisting snapshot boots successfully without error. However the disk apparently gets wiped in the process which I though wipe_table
prevented? I validated that creating an azure managed disk from my snapshot works by mounting yet another copy of said snapshot into the VM and checking its contents.
Am I doing something wrong with placing /var
on another disk and recreating the VM?
There's nothing in your config that wipes the disk. wipe_table: false
is the default (which just controls the partition table itself) and you haven't set wipe_filesystem: true
on the data filesystem. It might be worth checking Azure's handling of the data disk to verify that data is being correctly preserved. You can check whether Ignition is a factor here by removing the partition and filesystem declarations and performing the formatting/mounting by hand.
Is it possible that you created the snapshot before all of the new filesystem metadata was written back to the disk?
It might be worth checking Azure's handling of the data disk to verify that data is being correctly preserved
I created a second disk from the same snapshot and mounted that into my VM after ignition ran and the data is there so I don't believe this is the issue.
You can check whether Ignition is a factor here by removing the partition and filesystem declarations and performing the formatting/mounting by hand.
I'm not sure I follow. Removing the partition and filesystem declarations is easy enough but what exactly do you mean by "formatting/mounting by hand"? I assume you mean during the ignition run? Can you give any directions on how to do this? Alternatively are there any logs that ignition leaves behind that could help figure out whether ignition is the culprit?
Is it possible that you created the snapshot before all of the new filesystem metadata was written back to the disk?
After carefully reading https://coreos.github.io/ignition/operator-notes/#filesystem-reuse-semantics and especially the part about matching labels and uuid I explicitly set both label and uuid in my config to rule out that the metadata is the issue.
The following is from a VM that was created with a preexisting filesystem with label data
and uuid e1d21e70-080b-4cd2-a51b-4f58496b90fc
that I put /var
on (/dev/sdc
). After ignition ran I attached another disk also created from the original snapshot (/dev/sdd
) and both label and uuid match. (You might have to scroll the output to the right to see the UUID column)
$ sudo blkid -o list
device fs_type label mount point UUID
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/dev/sdd ext4 data (not mounted) e1d21e70-080b-4cd2-a51b-4f58496b90fc
/dev/sdb1 ntfs Temporary Storage (not mounted) DCF2DA34F2DA131E
/dev/sdc ext4 data /var e1d21e70-080b-4cd2-a51b-4f58496b90fc
/dev/sda4 xfs root /sysroot 033e9584-3979-4ec8-a24b-fd0c98651172
/dev/sda2 vfat EFI-SYSTEM (not mounted) D3AE-F344
/dev/sda3 ext4 boot /boot bbed36ea-ae69-43c7-862b-6d2fd1a273ec
/dev/sda1
Removing the partition and filesystem declarations is easy enough but what exactly do you mean by "formatting/mounting by hand"?
Boot the node, SSH to it, create the partition table and filesystem, mount the filesystem, snapshot the disk, boot another node.
Alternatively are there any logs that ignition leaves behind that could help figure out whether ignition is the culprit?
Yes, you can use journalctl -t ignition
to see the Ignition logs.
Is it possible that you created the snapshot before all of the new filesystem metadata was written back to the disk?
After carefully reading https://coreos.github.io/ignition/operator-notes/#filesystem-reuse-semantics and especially the part about matching labels and uuid I explicitly set both label and uuid in my config to rule out that the metadata is the issue.
That doesn't rule it out though. Did you snapshot the VM while it was still running, or after it was properly shut down?
After ignition ran I attached another disk also created from the original snapshot (
/dev/sdd
) and both label and uuid match.
Are you still seeing the reuse failure in that case?
I repeated my experiment once again this morning and still the data disk seems to get wiped and the filesystem recreated by ignition every time.
touch hello.txt
(in /var/home/core
)systemctl poweroff
and wait for VM to be stoppedls
not showing hello.txt
This time I pulled the logs for frodo and samwise using journalctl -t ignition
as you suggested and maybe I've found the issue. Contained in both logs is a line
found filesystem at "/dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0" with uuid "" and label ""
which is probably expected for frodo as that VM starts out with a fresh/empty disk. For samwise however I would have expected ignition to report finding a filesystem with proper label (data
) and uuid (e1d21e70-080b-4cd2-a51b-4f58496b90fc
). Is this the reason ignition wipes the disk/recreates the filesystem when it runs on samwise?
When I create another disk sanity_check from snapshot and attach that to samwise as LUN 1, mount it somewhere and check the filesystem and label using sudo blkid -o list
it reports the correct metadata for that disk. Also hello.txt
is there as expected
device fs_type label mount point UUID
--------------------------------------------------------------------------------------------------------------------------------------------------------
/dev/sdd ext4 data /var/home/core/data2 e1d21e70-080b-4cd2-a51b-4f58496b90fc
variant: fcos
version: 1.4.0
passwd:
users:
- name: core
password_hash: $1$KQSW9Uq/$yNAkRIbQvKGKPVdspcjEq0
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDeMUtyZtnfbT6KxQCuC3wgLH06xxlHs1Tvd5o9epuTPA9soEEO0LfLdhv9eDDB0XZ47yrfHMwn3l8ZLWbXA6EQ6W2NQbeZWRC17Xez3fvS9jUG0JKCbonhrZxveKABisbpvnQf3BtMgGRwygVMLG4gTO/goA0Yjy6WJQeQiNATKQbdR1mAtQhDk6BkK1EBA8EYHNvK7JVlsOLtSO4fq8k84ijRIzENO13jBf8z6C3qcaJ/PT49DsrFIz8XBXqs3qSO+0N3wiXp3RRFh0GnpUkVTXKi9bryNoJ+mXcjUJUF6+CyJiqZ41mjxrDq167kDbjrxhwzeLReT+kikCR/6wT91PugXb7JjH1DXMgQADlla3HG7mpo6J5llQc1LZee7Sa0zTdVOMCxuAK/kJSfnlsnPx4tI7qyRYuO/KM+i2uSDWFwa5EAfvKZUnilKt3aW08hylvrN+BwRqiJZ6jVpUZK8oLfHPgU4M/N00edJgTx0L2oyaIb2woBQskFjktDXhMcdlzXoPCEdMsE2dCT1BXrCpBkUyWAxJg32VGQfSn2i2PQx5jM51B5Bl8xxtf5vsogwCcGqOGN5KNVUcxYMtGy99tsLIr/vZCgiqnA3WPXsGv5N5WSP02OtiJ81uLz9UDROSz13bBGJ1lZqhT3IO+1SNb5Ao8Z5777ouap1OWGcw== supersecret
storage:
disks:
- device: /dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0
wipe_table: false
partitions:
- size_mib: 0
start_mib: 0
label: user_data
filesystems:
- path: /var
device: /dev/disk/by-path/acpi-VMBUS:00-vmbus-f8b3781b1e824818a1c363d806ec15bb-lun-0
format: ext4
with_mount_unit: true
label: data
uuid: e1d21e70-080b-4cd2-a51b-4f58496b90fc
files:
- path: /etc/udev/rules.d/66-azure-storage.rules
mode: 0750
contents:
source: https://raw.githubusercontent.com/Azure/WALinuxAgent/master/config/66-azure-storage.rules
- path: /etc/udev/rules.d/99-azure-product-uuid.rules
mode: 0750
contents:
source: https://raw.githubusercontent.com/Azure/WALinuxAgent/master/config/99-azure-product-uuid.rules
Ahh, I just noticed the problem. Your disks
section thinks you're putting the data filesystem on partition 1, but your filesystems
section thinks you're putting it directly on an unpartitioned disk. As a result, the partition table is overwriting the start of the filesystem and the filesystem is overwriting the partition table. The fix is to have the filesystems
section refer to the partition created in the disks
section.
Filed https://github.com/coreos/ignition/issues/1397 to automatically generate a warning in this case.
Thank you a lot for you patience 🙇
Using /dev/disk/by-partlabel/user_data
for the filesystem worked as expected 🥳
If you can perhaps point me in the right direction, I'd be happy to try and contribute using the WALA udev rules for azure disks either in the docs or directly in the fcos images for azure. What do you think?
Contributions are welcome if you're able to help! Ideally the rules would be included in a Fedora package that we could ship. Failing that, we could consider shipping them directly in fedora-coreos-config. I don't think the docs are a good way to proceed, especially since any udev rules specified via Ignition config don't take affect until after Ignition runs.
Regarding udev rules we already include the WALinuxAgent RPM in FCOS and have some glue to copy the udev rules into the initramfs.
One problem I see is that the RPM is a bit out of date with the latest release so maybe we just need to get the maintainer to bump things? --> Open BZ requesting new release be built: https://bugzilla.redhat.com/show_bug.cgi?id=2040980
The nuclear workaround for this is documented for now at https://github.com/openshift/os/blob/master/docs/faq.md#q-how-do-i-configure-a-secondary-block-device-via-ignitionmc-if-the-name-varies-on-each-node.
Similar to https://github.com/coreos/fedora-coreos-tracker/issues/1122 symlinks for secondary disks on azure are not stable. From what I can gather from trial an error there are
/dev/disk/azure/resource
and/dev/disk/azure/root
where it appears/dev/disk/azure/root
always references the boot disk correctly however/dev/disk/azure/resource
is randomly assigned to any one of the secondary disks (including the temporary disk that every VM gets)where
sda
is a secondary disk I attached to the VM (to place/var
on it via ignition),sdb
the boot disk andsdc
the temporary disk attached by azure.looking at the symlinks gives
What I'd actually like to to is place the user data on another disk via ignition so that I can backup that disk separately and in case disaster hits I can create a new VM with a snapshot of that disk (using
wipe_table: false
for that device).