drakkar-lig / debootstick

Generate a bootable live image from any Debian/Ubuntu filesystem tree.
62 stars 18 forks source link

the lvm/udev timeouts affect the installer type too #20

Closed dmuhamedagic closed 5 years ago

dmuhamedagic commented 5 years ago

Unfortunately, the installer type, at the time of moving the image to the local disk, is also affected by the infamous lvm/udev timeouts. They happen repeatedly for /dev/sda1 and /dev/sda3.

eduble commented 5 years ago

That's very annoying. When this happens, there is no udev running. These operations are performed by a custom init script debootstick runs when the system is first booted. So I am not sure how we can fix it. I will check.

eduble commented 5 years ago

There is a simple workaround:

  1. drop the --system-type installer option when building the image.
  2. run the migration script manually when the OS is booted: /opt/debootstick/live/init/migrate-to-disk.sh
dmuhamedagic commented 5 years ago

Yes, really annoying. Thanks for the workaround! I'd suggest to keep this issue open, in case somebody else runs into the problem.

eduble commented 5 years ago

Yes. I think the plan is to make --system-type installer obsolete and update the man page to let people know how to migrate. This is not first time lvm updates causes issues on this first-boot mechanism.

dmuhamedagic commented 5 years ago

I tried it out, but, with the 32gb usb stick it made things even slower. This LVM issue will be the end of us and the world ;-) What happens is that first the VG gets expanded on the USB stick (takes time) and then all of that gets copied to the local disk (takes just as much time). What do you think, would it be possible to keep the installer type, but have the migrate script run after all the other services started?

eduble commented 5 years ago

What do you think, would it be possible to keep the installer type, but have the migrate script run after all the other services started?

That is not as simple as it seems. Because you know, at the beginning of the migration script, we have this interaction with the user who can cancel the process by pressing a key. I think that's important to keep that. So that means:

So that is probably possible (with systemd), but highly dependent on the init system. And, because of that, I suspect it could cause much maintenance work in the future, and maybe disallow implementing support of alternate operating system distributions in the chroot. An option would be to implement it anyway, and return an error if the user tries to use the installer option in some specific environment we do not support.

About your specific scenario, do you know that you can let the migration script run, and continue using the system while it is running? (Because it is just happening at the LVM level, so at the filesystem level and above (applications) it is completely invisible.) Does it mitigate the issue?

dmuhamedagic commented 5 years ago

OK, yes, I see your points. If you think that it is important to have the interaction with the user, we can just keep that part as is. I mean, why should it matter if there's a boot happening between the question and the actual migrate script invocation? Otherwise, I wouldn't involve systemd at all either, but just have the script run from cron or similar and have it wait for the udev to start. What do you say?

As for my use case, using the system during the migration is not important. What counts is to have the boot/migration done in as little time as possible.

eduble commented 5 years ago

OK. So, what about:

At this time, you, as a user, can easily add a cronjob at boot, in the chroot, to perform the migration.

And / or I may restore the --system-type installer mode, which would automate this, and print a big warning at build time about what the image will do to your internal disk.

eduble commented 5 years ago

I think I got another idea, that might allow restoring previous behaviour and system-type option.

Currently, debootstick replaces /sbin/init by its own script in order to be called on boot, and then, when it completes the bootup procedure (e.g. extend filesystem or migrate), it restores and calls the usual /sbin/init program.

If, instead, the hook was built on /sbin/getty instead of /sbin/init, these scripts would be called much later in the bootup procedure, and probably with no issues regarding LVM. I think /sbin/getty is the right place, because this is where the user starts to interact. So, the user confirmation about the migration would be at the right place.

This hook would still need to adapt a little to the operating system. On my Ubuntu system at least, systemd does not call getty, but agetty. But there is still a /sbin/getty command, which is a symlink to agetty, so debootstick can detect this kind of setup and install the hook on agetty instead.

I will try it soon.

dmuhamedagic commented 5 years ago

Cool idea. BTW, you can just grep inittab to see which tty control program will run.

dmuhamedagic commented 5 years ago

Just FYI, I timed now my 2GB installation, prepared by FAI, it took 8 minutes from USB boot to the prompt, out of which there were around 40 lvm/udev timeouts (sda1 and sda3). So, it would otherwise take less than a minute if we also exclude the 10 seconds user confirmation pause.

eduble commented 5 years ago

Cool idea. BTW, you can just grep inittab to see which tty control program will run.

Apparently systemd does not use inittab, so by default the file is not present.

dmuhamedagic commented 5 years ago

Cool idea. BTW, you can just grep inittab to see which tty control program will run.

Apparently systemd does not use inittab, so by default the file is not present.

Whatever was wrong with inittab. Oh, well. Perhaps something like this:

f=$(grep -lrw tty1 /usr/lib/systemd /etc/systemd/ | grep 'service$')
getty_prog=$(awk -F= '/^ExecStart/{sub(" .*","",$2); sub("^-","",$2); print $2}' $f)
eduble commented 5 years ago

I tested the "hook on getty" idea, it seems to work well. :) Could you test it, just in case you would find issues in your case?

It's on branch hook-on-getty. The commit is here: https://github.com/drakkar-lig/debootstick/commit/31affeda00c99b81b073b77ed6a98f58814cc701

The tricky part was due to the fact systemd spawns several getty processes concurrently, 1 for each tty console. So I had to implement a kind of locking to allow only 1 of them to run the init procedure. If you are interested:

dmuhamedagic commented 5 years ago

Excellent! A wonderful job! Migration was done in a bit more than 2 minutes, wall clock including the boot from the USB stick. partx failed, but I'll open a separate issue for that.

eduble commented 5 years ago

Hi Dejan, to conclude on this issue, here is what I propose:

Unfortunately, the change is not so trivial, but I hope we can still have it included for buster. Could you please write a debian bug report about it? I would reference it (and it would show that users of debootstick need this update). We could consider including fixes about partx-related issues too. If you want it, I will also need debian bug reports for those. Thanks!

dmuhamedagic commented 5 years ago

Hi Etienne, sure, I'll file bugs for this and for other issues too.

dmuhamedagic commented 5 years ago

Oh, please do keep the "installer" type, that's a killer feature! :-)

dmuhamedagic commented 5 years ago

Just reported this to bugs.debian.org.