desultory / ugrd

A minimalistic initramfs generator, designed for FDE
GNU General Public License v2.0
37 stars 13 forks source link

[help] hibernation support #82

Open julie-de-ville opened 1 month ago

julie-de-ville commented 1 month ago

Is hibernation resume support included automatically? I don't see a module for it, and my system supports hibernation, but it will not resume after suspending to disk. I am using gentoo with linux version 6.6.52

desultory commented 1 month ago

It does not currently have hibernation support, but this shouldn't be very hard to add.

Have you tried using the "resume=" kernel command line arg? I have not tested this but assumed it could be used to resume off of unencrypted swap.

julie-de-ville commented 1 month ago

That would be great, I found it much better than dracut. I have the resume parameter in grub.

desultory commented 1 month ago

Do you have encrypted swap? I think it shouldn't be too hard to add support for simple resuming, I'm just not sure why the builtin kernel parameter doesn't work alone. Maybe it ignores that option if an initrd is used?

desultory commented 1 month ago

https://wiki.gentoo.org/wiki/Custom_Initramfs/Hibernation

I've hesitated to add support in the initrd as there are many things which can go wrong. I think passing the supplied info to /sys/power/resume could be enough.

desultory commented 1 month ago

https://github.com/desultory/ugrd/commit/0175133404709c34686c811012d900aca4107077 I'm not sure about this, but I think it may be a reasonable start? resume= expects a device path, I'm not sure if it makes sense to resolve a UUID.

desultory commented 1 month ago

Right now, it enters a fail state if a resume partition is passed and it fails to resume, this means it won't normally boot. I'm not sure how much to consider those warnings about data loss. If it hibernated, and you reboot without considering the saved state, there could potentially be serious data loss, similar to if you did a hard shutdown. I think most systems just attempt to resume from swap if possible, but continue if not. I'm going to check/test a bit more

desultory commented 1 month ago

That would be great, I found it much better than dracut. I have the resume parameter in grub.

did you manually add the parameter? I think for the sake of safety, I will be forcing resume attempts if resume= is set. It's potentially very dangerous to start a system fresh if it expects to return from a hibernation state.

julie-de-ville commented 1 month ago

Yes, I manually added it to grub. I booted twice in that manner, but I haven't attempted resuming again since I have narrowed it down to initramfs. That is a good idea.

desultory commented 1 month ago

Yes, I manually added it to grub. I booted twice in that manner, but I haven't attempted resuming again since I have narrowed it down to initramfs. That is a good idea.

yeah it's probably not safe to resume right now, you could have a bit of data loss each time as it expects to later resume from the current ram state.

As far as I know, there is no way to know if a system should resume at boot time, other than the passed kernel cmdline parameters. It's safest to prevent booting if that was passed but cannot be performed.


I've gotten some help looking into this, and I think it is probably safe to boot if it can't resume, and the device is found. If the resume source device cannot be found, something is wrong and booting will stop (in the current form the resume module takes)

julie-de-ville commented 1 month ago

As for path/uuid, I have my resume parameter set to the mapped, decrypted luks partition, at /dev/mapper/gentoo-root. Btw, I am using btrfs on an encrypted luks partition, and for S5 I suspend to a swapfile on the root subvolume, if that helps. Also, I had to set the resume_offset as a parameter as well, since I am using a swapfile.

desultory commented 1 month ago

As for path/uuid, I have my resume parameter set to the mapped, decrypted luks partition, at /dev/mapper/gentoo-root. Btw, I am using btrfs on an encrypted luks partition, and for S5 I suspend to a swapfile on the root subvolume, if that helps. Also, I had to set the resume_offset as a parameter as well, since I am using a swapfile.

resume should be set to your swap partition, the support I just added only supports plain swap, I may add support for encrypted swap too.

As it is, it will boot normally if it cant resume using the provided resume path (using a partuuid is best), if it cannot find the source device, it will enter a fail state.

julie-de-ville commented 1 month ago

Oh I see, my swap is encrypted so I don't think I will be able to test it safely, though I would really like to get it working, so if there is anything I can do to help lmk. I tried to get it working with dracut by adding the crypt and resume modules, and including /etc/crypttab, but it hung on a black screen with a spinning wheel.

desultory commented 1 month ago

Oh I see, my swap is encrypted so I don't think I will be able to test it safely, though I would really like to get it working, so if there is anything I can do to help lmk. I tried to get it working with dracut by adding the crypt and resume modules, and including /etc/crypttab, but it hung on a black screen with a spinning wheel.

Using encrypted swap is somewhat complex. I'd have to add a new method specifically for opening that, which can run first. The real tricky part is that would likely need to attempt to run on every boot. When you're booting fresh, that will just be a waste of time because it will not be able to resume.

you mentioned /dev/mapper/gentoo-root as being your resume device, is that your root partition? Do you have a separate partition that is luks encrypted, or are you using lvm? resuming from swap files is especially difficult because the file offset on the disk must be set.

As for path/uuid, I have my resume parameter set to the mapped, decrypted luks partition, at /dev/mapper/gentoo-root. Btw, I am using btrfs on an encrypted luks partition, and for S5 I suspend to a swapfile on the root subvolume, if that helps. Also, I had to set the resume_offset as a parameter as well, since I am using a swapfile.

Are you sure the resume_offset you found is correct? That is my only assumption why dracut may fail, unless it doesn't properly support hibernation from luks devices. Did it ever ask for the key for your root device? I really hesitate to even open luks devices because I'm not sure if opening them has any chance at writing anything. If you touch storage devices at all between hibernation and resuming, that be may harmful.

desultory commented 1 month ago

this isn't really the best UX, but maybe you could take advantage of the fact that it fails, and then choose whether you want to manually run "crypt_init" which will run the cryptsetup unlock procedure or tell it to ignore resuming. Then you can exit the recovery shell, and on the second pass it will see the resume source and use that.

julie-de-ville commented 1 month ago

Thank you, how would I go about doing that? If you could point me in the right direction, that's be great. The resume offset is correct; if the offset is not calculated correctly, it will not hibernate at all, and now it hibernates but doesn't resume.

desultory commented 1 month ago

Thank you, how would I go about doing that? If you could point me in the right direction, that's be great. The resume offset is correct; if the offset is not calculated correctly, it will not hibernate at all, and now it hibernates but doesn't resume.

If you add the recovery kernel cmdline arg, when it fails it should open a bash shell, from there you can try to do things manually and see what works. crypt_init is a function in ugrd which should run the luks procedures for your device. if you can do that, it may work. handle_resume should do the whole procedure to resume based on your kernel command line, but you could try to manually echo something to /sys/power/resume

julie-de-ville commented 1 week ago

Hi, I didn't see your last reply. Would it be possible to get an initramfs to resume from encrypted hibernation using those cmdline args, or would I have to do it manually each time?

desultory commented 1 week ago

Hi, I didn't see your last reply. Would it be possible to get an initramfs to resume from encrypted hibernation using those cmdline args, or would I have to do it manually each time?

A user has made a patch which seems to make it work, I'm considering adding it but still unsure about safety, see: https://www.kernel.org/doc/html/latest/power/swsusp.html

I don't think it should be an issue since the mount is made after the resume stage, but I'm not sure how this will work with other storage which is mounted, such as a device for keyfiles or headers.

diff --git a/src/ugrd/fs/resume.py b/src/ugrd/fs/resume.py
index 38f5ddd..9255d9c 100644
--- a/src/ugrd/fs/resume.py
+++ b/src/ugrd/fs/resume.py
@@ -1,7 +1,8 @@
 __version__ = "0.4.0"

+from zenlib.util import contains, unset

-def handle_resume(self) -> None:
+def _resume(self) -> None:
     """Returns a bash script handling resume from hibernation.
     Checks that /sys/power/resume is writable, resume= is set, and noresume is not set, if so,
     checks if PARTUUID= is in the resume var, and tries to use blkid to find the resume device.
@@ -17,7 +18,7 @@ def handle_resume(self) -> None:
     return [
         "resumeval=$(readvar resume)",  # read the cmdline resume var
         'if ! check_var noresume && [ -n "$resumeval" ] && [ -w /sys/power/resume ]; then',
-        '    if echo "$resumeval" | grep -q "PARTUUID="; then',  # resolve partuuid to device
+        '    if echo "$resumeval" | grep -q -E "^PARTUUID=|^UUID="; then',  # resolve partuuid or uuid to device
         '        resume=$(blkid -t "$resumeval" -o device)',
         "    else",
         "        resume=$resumeval",
@@ -35,3 +36,11 @@ def handle_resume(self) -> None:
         "    fi",
         "fi",
     ]
+
+@unset('late_resume')
+def handle_resume(self) -> None:
+    return _resume(self)
+
+@contains('late_resume', "Using late resume.")
+def late_resume(self) -> None:
+    return _resume(self)
\ No newline at end of file
diff --git a/src/ugrd/fs/resume.toml b/src/ugrd/fs/resume.toml
index c4732dd..fa74744 100644
--- a/src/ugrd/fs/resume.toml
+++ b/src/ugrd/fs/resume.toml
@@ -2,3 +2,9 @@ cmdline_strings = [ "resume" ]

 [imports.init_early] 
 "ugrd.fs.resume" = [ "handle_resume" ]
+
+[imports.init_premount]
+"ugrd.fs.resume" = [ "late_resume" ]
+
+[custom_parameters]
+late_resume = "bool"
\ No newline at end of file