Open skinowski opened 2 years ago
Thanks for all the context and logs. Since the problem doesn't correct itself until a manual partx -u
, the problem likely isn't a race in Ignition (where we're simply not waiting long enough), but instead, something in the sgdisk -> kernel -> udev chain is not properly handling the reread.
Things you could try:
rd.udev.log_level=debug
kernel argument, and see if there's anything interesting in the output.<jakinney> bgilbert: quick update on the issue you helped me look at the
other day. This looks like it's related to boot/root being host on md raids
prior to Ignition beginning.
<jakinney> In this situation, the kernel won't pick up newly created
partitions without help from e.g. partx. Even though sgdisk calls ioctl()
with BLKRRPART.
<jakinney> Md must be interfering in some way.
<bgilbert> jakinney: yeah, that makes sense. the Ignition disks stage
assumes it has exclusive control of any disk being modified.
<bgilbert> jakinney: is the RAID left over from a previous provisioning run,
or are you shipping images that have RAIDs already created?
Even without a raid setup something seems missing/broken in sgdisk. When creating a new partition on the boot device, it won't be recognized until partprobe /dev/vda
. Maybe we should add this command (or partx -u
) to internal/sgdisk/sgdisk.go
's Commit()
function.
Bug
udev race fix https://github.com/coreos/ignition/pull/446 does not address if raid devices depend on new created partitions. Perhaps, partx needs to be run after partition creation.
Operating System Version
oracle linux 8
Ignition Version
2.9.0
Environment
bare metal hardware / oracle cloud infrastructure
Expected Behavior
We successful completion of raid1 setup using the new partitions.
Actual Behavior
Raid setup phase times out while waiting for new partition /dev/ links.
Reproduction Steps
ignition storage config for creating 4 new partitions in 2 disks and a raid1 mirror using these new partitions.
Other Information
ignition fetches the config and creates partitions (using sgdisk) with no issue, but later when creating of RAID arrays, ignition fails due to the partitions not appearing in the kernel partition table:
sgdisk output at the time of failure:
device tree at time of failure:
/proc/partitions
partx is able to update the partition table: