IBT-FMI / gebuilder

Gentoo System and Image Builder
GNU General Public License v3.0
11 stars 0 forks source link

Unable to boot OpenStack images: “No configuration file found” #11

Closed TheChymera closed 6 years ago

TheChymera commented 6 years ago

Continuing here, since this seems to not be related to btrfs, also all examples here are using ext4 unless otherwise indicated.

@Doeme continued from here:

I pasted the entire logs earlier, but, upon looking again, I see nothing suspicious in that (or any other sections):

bs1 ~ # cat /usr/share/gebuilder/roots/stemgentoo/logs/openstack_image/40-generate_bootchain.sh.log
Copying syslinux files
Installing extlinux
/usr/share/gebuilder/roots/stemgentoo/root/../mnt/boot/syslinux/ is device /dev/loop3p1
Warning: unable to obtain device geometry (defaulting to 64 heads, 32 sectors)
         (on hard disks, this is usually harmless.)
Writing bootloader, booting from UUID 87b4ee50-c18e-42e0-b832-17482d20d711
Writing fstab root-entry
Generating initramfs /usr/share/gebuilder/roots/stemgentoo/root/../mnt/boot/initramfs-4.9.76-gentoo
dracut: Executing: /usr/bin/dracut --no-kernel -m "base rootfs-block" /usr/share/gebuilder/roots/stemgentoo/root/../mnt/boot/initramfs-4.9.76-gentoo 4.9.76-gentoo
dracut: *** Including module: rootfs-block ***
dracut: *** Including module: udev-rules ***
dracut: Skipping udev rule: 40-redhat.rules
dracut: Skipping udev rule: 50-firmware.rules
dracut: Skipping udev rule: 50-udev.rules
dracut: Skipping udev rule: 91-permissions.rules
dracut: Skipping udev rule: 80-drivers-modprobe.rules
dracut: *** Including module: base ***
dracut: *** Including module: fs-lib ***
dracut: *** Including modules done ***
dracut: *** Resolving executable dependencies ***
dracut: *** Resolving executable dependencies done***
dracut: *** Stripping files ***
dracut: *** Stripping files done ***
dracut: *** Store current command line parameters ***
dracut: *** Creating image file '/usr/share/gebuilder/roots/stemgentoo/mnt/boot/initramfs-4.9.76-gentoo' ***
dracut: *** Creating initramfs image file '/usr/share/gebuilder/roots/stemgentoo/mnt/boot/initramfs-4.9.76-gentoo' done ***

I also tried sourcing the OPENSTACK_IMG_UUID here (no idea if this is right, but UUID was in any case always empty):

debug "Writing fstab root-entry"
cat <<-EOF >> ${OPENSTACK_IMG_MNT}/etc/fstab
UUID=$OPENSTACK_IMG_UUID              /               $OPENSTACK_FILESYSTEM            noatime         0 1
EOF

And still to no avail.

Doeme commented 6 years ago

So to bracket the problem:

  1. The System BIOS finds a partition marked bootable
  2. The BIOS jumps to the MBR of that partition containing the stage1 of syslinux
  3. Syslinux can in fact initialize. But can it also load further stages? the boot: prompt in the screenshot https://user-images.githubusercontent.com/950524/36382125-2757f1e4-1588-11e8-85cd-6cbfab3af1c4.png seems to imply that it can. If the bootloader can load additional stages, it probably means that 4.a 4.b do not apply, since it can load them from the btrfs-partitoin
  4. The further stages can not: a) find the relevant partition, b) read the btrf-filesystem c) find the configuration file
TheChymera commented 6 years ago

The further stages can not: a) find the relevant partition, b) read the btrf-filesystem c) find the configuration file

This is all ext4 now, so I guess we can exclude b) ?

Doeme commented 6 years ago

Probably. Sometimes, the symbolic link /boot/boot -> . exists. Is this true for the image?

TheChymera commented 6 years ago

Apparently not:

bs1 ~ # ls /mnt/debug/boot/ -lah
total 12M
drwxr-xr-x  3 root root 4.0K Feb 20 01:30 .
drwxr-xr-x 21 root root 4.0K Feb 20 01:30 ..
-rw-r--r--  1 root root    0 Feb  6 21:51 .keep
-rw-r--r--  1 root root 2.3M Feb 20 01:30 System.map-4.9.76-gentoo-r1
-rw-r--r--  1 root root  71K Feb 20 01:30 config-4.9.76-gentoo-r1
lrwxrwxrwx  1 root root   23 Feb 20 01:30 initramfs -> initramfs-4.9.76-gentoo
-rw-------  1 root root 4.2M Feb 20 01:30 initramfs-4.9.76-gentoo
drwxr-xr-x  2 root root 4.0K Feb 20 01:30 syslinux
lrwxrwxrwx  1 root root   24 Feb 20 01:30 vmlinuz -> vmlinuz-4.9.76-gentoo-r1
-rw-r--r--  1 root root 4.8M Feb 20 01:30 vmlinuz-4.9.76-gentoo-r1
Doeme commented 6 years ago

oh wow. Create it and try again.

Iirc. this symbolic link is part of the stage3-tarball. I have no Idea how it got lost.

Can you check afterwords whether the link exists in roots/<id>/root/boot/?

Doeme commented 6 years ago

ln -s . /boot/boot should do the trick

TheChymera commented 6 years ago

uhm, I'm in the root home - you mean:

ln -s /mnt/debug/boot/ /mnt/debug/boot/boot

?

Doeme commented 6 years ago

Nope. ln is quite stupid and doesn't alter the path you give it first but writes it directly into the inode. i.e.

cd /path/that/is/completely/irrelevant
ln -s . /path1/link 
realpath /path1/link -> /path1/
mv /path1/link /path2/link
realpath /path2/link -> /path2/

If you want ln -s to behave as one might expect, ln -s -r is the way to go, i.e.

cd /path/that/is/relevant
ln -s -r . /path1/link
realpath /path1/link -> /path/that/is/relevant

But you can, of course, cd into /boot if it makes you feel more comfortable

TheChymera commented 6 years ago

Dear god :-/

so then:

cd /mnt/debug/boot
ln -s . /boot/boot

?

Doeme commented 6 years ago

Rather

cd /mnt/debug/boot
ln -s . boot

PSA: I'm going to bed now. Godspeed to you.

TheChymera commented 6 years ago

ok, cool stuff, this worked.

What's a bit puzzling is that the current openstack system which I am running (as well as systems based on images preceding your project) lack this symlink. Would you recomend we stop digging and just integrate the symlink creation into the build process? where?

Doeme commented 6 years ago

If anywhere then in the 40-generate_bootchain.sh But I'd rather find out why the symlink disappears, and I suspect the culprit being a call to cp or rsync which doesn't sync symlinks.

As I said, could you check whether the symlink exists in roots/<id>/root/boot?

TheChymera commented 6 years ago

looks empty:

bs1 /usr/share/gebuilder/roots/stemgentoo # ls root/boot/ -lah
total 8.0K
drwxr-xr-x  2 root root 4.0K Feb  6 21:51 .
drwxr-xr-x 20 root root 4.0K Feb 19 22:14 ..
-rw-r--r--  1 root root    0 Feb  6 21:51 .keep
TheChymera commented 6 years ago

Also, you say that the symlink disappears, but I cannot see where it's supposed to be created?

bs1 /usr/share/gebuilder # ag boot config/ scripts/ utils/
scripts/openstack_image/default/35-setup_openstack.sh.chroot
14:rc-update add dhcpcd boot
34:pushd /boot/

scripts/openstack_image/stemgentoo/35-setup_openstack.sh.chroot
14:rc-update add dhcpcd boot
34:pushd /boot/

scripts/openstack_image/default/40-generate_bootchain.sh
8:mkdir ${OPENSTACK_IMG_MNT}/boot/syslinux
9:cp /usr/share/syslinux/{menu.c32,memdisk,libcom32.c32,libutil.c32} "${OPENSTACK_IMG_MNT}/boot/syslinux/"
12:extlinux --device="${OPENSTACK_IMG_LODEV}p1" --install "${OPENSTACK_IMG_MNT}/boot/syslinux/"
14:debug "Writing bootloader, booting from UUID $OPENSTACK_IMG_UUID"
15:cat <<-EOF > ${OPENSTACK_IMG_MNT}/boot/syslinux/syslinux.cfg
18:      LINUX /boot/vmlinuz root=UUID=$OPENSTACK_IMG_UUID rootfstype=ext4 console=ttyS0,115200n8
19:      INITRD /boot/initramfs
27:INITRAMFS="${OPENSTACK_IMG_MNT}/boot/initramfs-$KERNELVERSION"
30:ln -s "initramfs-$KERNELVERSION" "${OPENSTACK_IMG_MNT}/boot/initramfs"

scripts/openstack_image/stemgentoo/40-generate_bootchain.sh
8:mkdir ${OPENSTACK_IMG_MNT}/boot/syslinux
9:cp /usr/share/syslinux/{menu.c32,memdisk,libcom32.c32,libutil.c32} "${OPENSTACK_IMG_MNT}/boot/syslinux/"
12:extlinux --device="${OPENSTACK_IMG_LODEV}p1" --install "${OPENSTACK_IMG_MNT}/boot/syslinux/"
14:debug "Writing bootloader, booting from UUID $OPENSTACK_IMG_UUID"
15:cat <<-EOF > ${OPENSTACK_IMG_MNT}/boot/syslinux/syslinux.cfg
18:      LINUX /boot/vmlinuz root=UUID=$OPENSTACK_IMG_UUID rootfstype=$OPENSTACK_FILESYSTEM console=ttyS0,115200n8
19:      INITRD /boot/initramfs
27:INITRAMFS="${OPENSTACK_IMG_MNT}/boot/initramfs-$KERNELVERSION"
30:ln -s "initramfs-$KERNELVERSION" "${OPENSTACK_IMG_MNT}/boot/initramfs"

utils/openstack_kernel_nodocker.config
418:CONFIG_NO_BOOTMEM=y
450:# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
495:# CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
555:# CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set
1019:# CONFIG_ISCSI_BOOT_SYSFS is not set
2293:CONFIG_X86_VERBOSE_BOOTUP=y

utils/openstack_kernel.config
141:# CONFIG_RCU_EXPEDITE_BOOT is not set
416:CONFIG_NO_BOOTMEM=y
447:# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
492:# CONFIG_HAVE_BOOTMEM_INFO_NODE is not set
553:# CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set
1262:# CONFIG_ISCSI_BOOT_SYSFS is not set
2521:CONFIG_X86_VERBOSE_BOOTUP=y

I checked, and no other system of mine has this symlink.

TheChymera commented 6 years ago

@Doeme highly interesting developments: the existence of boot/boot is not what solves the issue and its absence is not what causes it, it seems there's an issue with running the script:

The script (as executed in the build system) generates an unusable image:

bs1 /usr/share/gebuilder/roots # cat stemgentoo/hooks/openstack_image/post/60-upload_image.sh
#!/bin/bash

OS_USER="???"
OS_PW="???"
OS_TENANT="???"
OS_IMGNAME="stemgentoo"

function gl(){
glance --os-username "$OS_USER" \
  --os-password "$OS_PW" \
  --os-tenant-name "$OS_TENANT" \
  --os-auth-url https://cloud.s3it.uzh.ch:5000/v2.0 \
  --os-image-api-version 2 "$@"
}

if [ -f "${ROOT}/../registry/openstack_image" ]
then
        UUID="$(sed -n  's/|[[:blank:]]\+id[[:blank:]]\+|[[:blank:]]\+\([a-z0-9\-]\+\)[[:blank:]]\+|/\1/p' "${ROOT}/../registry/openstack_image")"
        debug "Deleting old image with uuid $UUID"
        gl image-delete "$UUID"
else
        ensure_dir "${ROOT}/../registry/"
fi
debug "Uploading new image with name $OS_IMGNAME"
gl image-create --disk-format raw --container-format bare --name "$OS_IMGNAME" --file "$OPENSTACK_IMAGE" >"${ROOT}/../registry/openstack_image"

The script (hacked and executed stanfalone) generates a usable image:

bs1 ~ # cat 60-upload_image.sh
#!/bin/bash

OS_USER="???"
OS_PW="???"
OS_TENANT="???"
OS_IMGNAME="stemgentoo"

function gl(){
glance --os-username "$OS_USER" \
  --os-password "$OS_PW" \
  --os-tenant-name "$OS_TENANT" \
  --os-auth-url https://cloud.s3it.uzh.ch:5000/v2.0 \
  --os-image-api-version 2 "$@"
}

if [ -f "${ROOT}/../registry/openstack_image" ]
then
        UUID="$(sed -n  's/|[[:blank:]]\+id[[:blank:]]\+|[[:blank:]]\+\([a-z0-9\-]\+\)[[:blank:]]\+|/\1/p' "${ROOT}/../registry/openstack_image")"
        #debug "Deleting old image with uuid $UUID"
        gl image-delete "$UUID"
else
        echo "l2l"
        #ensure_dir "${ROOT}/../registry/"
fi

#debug "Uploading new image with name $OS_IMGNAME"
echo "Myecho: $OPENSTACK_IMAGE"
gl image-create --disk-format raw --container-format bare --name "$OS_IMGNAME" --file "$OPENSTACK_IMAGE" >"${ROOT}/../registry/openstack_image"
bs1 ~ # OPENSTACK_IMAGE=/usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220 ROOT=/usr/share/gebuilder/roots/stemgentoo/root ./60-upload_image.sh

The glance cmmand (executed directly from the command line) generates a usable image:

bs1 /usr/share/gebuilder/roots # glance --os-username "???" --os-password "???" --os-tenant-name "???" --os-auth-url https://cloud.s3it.uzh.ch:5000/v2.0 --os-image-api-version 2 image-create --disk-format raw --container-format bare --name "sg_test2" --file /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220 > stemgentoo/registry/openstack_image

I'm continuing to debug this, but input would be appreciated, since trial and error is quite slow..

Doeme commented 6 years ago

Since the script did not change at all (except for some echos) I guess its related to the command-line variables. maybe echo the relevant variables at the beginning of the script and look into the logs

TheChymera commented 6 years ago

It seems something happens to the image in the build process after it is uploaded, something which makes it work again. @Doeme any ideas? Possibly the unmounts? 0.o

[...]
Executing openstack_image/stemgentoo/50-restore_root.sh
Ensuring /usr/share/gebuilder/roots/stemgentoo/root/../logs/openstack_image/ is a directory
executing scripts /usr/share/gebuilder/roots/stemgentoo/root/../hooks/openstack_image/post/60-upload_image.sh
Executing /usr/share/gebuilder/roots/stemgentoo/root/../hooks/openstack_image/post/60-upload_image.sh
Ensuring /usr/share/gebuilder/roots/stemgentoo/root/../logs/openstack_image/ is a directory
No image with an ID of 'b2fed0b6-f71b-4ede-a0de-01855599904d' exists.
Image is: /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220
MD5SUM is : e7f18cc6483ced1368459e3aa95f6532  /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220
Finished succesfully
Cleaning up
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/tmp"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/var/tmp/portage"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/sys"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/proc"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/dev/pts"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/dev"
executing umount /usr/share/gebuilder/roots/stemgentoo/root/../mnt
executing losetup -d /dev/loop2
roots/stemgentoo/hooks/openstack_image/chain
bs1 ~ # md5sum /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220
7273a170e5c6f79fce4d3faddda706be  /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220
bs1 ~ # cat /usr/share/gebuilder/roots/stemgentoo/hooks/openstack_image/post/60-upload_image.sh
#!/bin/bash

OS_USER="???"
OS_PW="???"
OS_TENANT="???"
OS_IMGNAME="stemgentoo"

function gl(){
glance --os-username "$OS_USER" \
  --os-password "$OS_PW" \
  --os-tenant-name "$OS_TENANT" \
  --os-auth-url https://cloud.s3it.uzh.ch:5000/v2.0 \
  --os-image-api-version 2 "$@"
}

if [ -f "${ROOT}/../registry/openstack_image" ]
then
        UUID="$(sed -n  's/|[[:blank:]]\+id[[:blank:]]\+|[[:blank:]]\+\([a-z0-9\-]\+\)[[:blank:]]\+|/\1/p' "${ROOT}/../registry/openstack_image")"
        #debug "Deleting old image with uuid $UUID"
        gl image-delete "$UUID" || true
else
        echo "lala"
        #ensure_dir "${ROOT}/../registry/"
fi
#debug "Uploading new image with name $OS_IMGNAME"
echo "Image is: ${OPENSTACK_IMAGE}"
MD5SUM=$(md5sum $OPENSTACK_IMAGE)
echo "MD5SUM is : ${MD5SUM}"
gl image-create --disk-format raw --container-format bare --name "$OS_IMGNAME" --file "$OPENSTACK_IMAGE" >"${ROOT}/../registry/openstack_image"
TheChymera commented 6 years ago

@Doeme ok, so it was the unmounting, or possibly executing losetup -d /dev/loop2. What fixed it, is calling cleanup before the openstack image upload:

bs1 ~ # cat /usr/share/gebuilder/roots/stemgentoo/hooks/openstack_image/post/60-upload_image.sh
#!/bin/bash

OS_USER="???"
OS_PW="???"
OS_TENANT="???"
OS_IMGNAME="stemgentoo"

function gl(){
glance --os-username "$OS_USER" \
  --os-password "$OS_PW" \
  --os-tenant-name "$OS_TENANT" \
  --os-auth-url https://cloud.s3it.uzh.ch:5000/v2.0 \
  --os-image-api-version 2 "$@"
}

if [ -f "${ROOT}/../registry/openstack_image" ]
then
        UUID="$(sed -n  's/|[[:blank:]]\+id[[:blank:]]\+|[[:blank:]]\+\([a-z0-9\-]\+\)[[:blank:]]\+|/\1/p' "${ROOT}/../registry/openstack_image")"
        #debug "Deleting old image with uuid $UUID"
        gl image-delete "$UUID" || true
else
        echo "lala"
        #ensure_dir "${ROOT}/../registry/"
fi
#debug "Uploading new image with name $OS_IMGNAME"
cleanup
echo "Image is: ${OPENSTACK_IMAGE}"
MD5SUM=$(md5sum $OPENSTACK_IMAGE)
echo "MD5SUM is : ${MD5SUM}"
gl image-create --disk-format raw --container-format bare --name "$OS_IMGNAME" --file "$OPENSTACK_IMAGE" >"${ROOT}/../registry/openstack_image"

The checksum inconsistency also - quite predictably - disappears:

[...]
Executing openstack_image/stemgentoo/50-restore_root.sh
Ensuring /usr/share/gebuilder/roots/stemgentoo/root/../logs/openstack_image/ is a directory
executing scripts /usr/share/gebuilder/roots/stemgentoo/root/../hooks/openstack_image/post/60-upload_image.sh
Executing /usr/share/gebuilder/roots/stemgentoo/root/../hooks/openstack_image/post/60-upload_image.sh
Ensuring /usr/share/gebuilder/roots/stemgentoo/root/../logs/openstack_image/ is a directory
No image with an ID of '3e33a5fa-8633-4356-8219-9b40ddd3489e' exists.
Cleaning up
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/tmp"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/var/tmp/portage"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/sys"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/proc"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/dev/pts"
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/dev"
executing umount /usr/share/gebuilder/roots/stemgentoo/root/../mnt
executing losetup -d /dev/loop2
Image is: /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220
MD5SUM is : 9c6703fbcf59504e9e8baf6d6593115a  /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220
Finished succesfully
Cleaning up
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/tmp"
umount: /usr/share/gebuilder/roots/stemgentoo/root/../mnt/tmp: not found
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/var/tmp/portage"
umount: /usr/share/gebuilder/roots/stemgentoo/root/../mnt/var/tmp/portage: not found
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/sys"
umount: /usr/share/gebuilder/roots/stemgentoo/root/../mnt/sys: not found
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/proc"
umount: /usr/share/gebuilder/roots/stemgentoo/root/../mnt/proc: not found
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/dev/pts"
umount: /usr/share/gebuilder/roots/stemgentoo/root/../mnt/dev/pts: not found
executing umount -R "/usr/share/gebuilder/roots/stemgentoo/root/../mnt/dev"
umount: /usr/share/gebuilder/roots/stemgentoo/root/../mnt/dev: not found
executing umount /usr/share/gebuilder/roots/stemgentoo/root/../mnt
umount: /usr/share/gebuilder/roots/stemgentoo/root/../mnt: not mounted.
executing losetup -d /dev/loop2
losetup: /dev/loop2: detach failed: No such device or address
roots/stemgentoo/hooks/openstack_image/chain
bs1 ~ # md5sum /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220                                                                                                                                     
9c6703fbcf59504e9e8baf6d6593115a  /usr/share/gebuilder/roots/stemgentoo/root/../openstack_images//image_20180220

This hack leads to some inelegant error messages when the builtin cleanup is called, so maybe there's a better way to solve this. Particularly because clearly, at some point in the past, this was not an issue.

I am thinking maybe as part of the Docker kernel update, something subtly changed how loopbacks are managed - though this theory places the needle in a rather deep haystack. Maybe it's something a lot more banal?

Doeme commented 6 years ago

Ah, I see. This seems to be a shortcoming of the cleanup routine. A hacky fix would be to introduce a new command openstack_image_upload that gets chained after openstack_image. But I think a much more elegant method would be stack-saving for the cleanup-stack, i.e. a call to cleanup_stack_save() marked a position in the stack, and a call cleanup_stack_restore() would execute all cleanup tasks added to the stack after cleanup_stack_save() was called. Hence, we could stack_save() before https://github.com/IBT-FMI/gebuilder/blob/master/gebuilder/scripts/openstack_image/default/15-mount_image.sh#L9 and stack_restore() before uploading the image.

Doeme commented 6 years ago

Whops, this was actually intended for a non-master branch, but it seems I forgot to switch before commiting. So the master containts untested bigger changes to openstack_image

TheChymera commented 6 years ago

seems to work.