gissf1 / zram-hibernate

Allows dynamic swap changes to activate disk-based storage as swap for hibernation support when a system typically uses only zram swap during normal operation.
Apache License 2.0
33 stars 2 forks source link

Help with setting up `zram-hibernate` #1

Open JustSimplyKyle opened 1 year ago

JustSimplyKyle commented 1 year ago

It's probably a me problem tbh When I run doas /usr/lib/systemd/system-sleep/zram-hibernate it spit out this

[90315] creating tmpfile...
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
[90315] lock acquired.
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
getResumeInfo() Not a valid block device: KERNEL_RESUME_DEVICE=
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
ensureDiskSwap(): Initial swap status...
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
[19:24:08] Used=5426M MemFree=22G SwapUsed=0K/31G Overcommit=0K
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
egrep: warning: egrep is obsolescent; using grep -E
        zram0:  0.0% (0K/31G) [100]
egrep: warning: egrep is obsolescent; using grep -E
fgrep: warning: fgrep is obsolescent; using grep -F
egrep: warning: egrep is obsolescent; using grep -E
FATAL: no valid swaps found. cannot suspend to disk.

I thought this script is supposed to make hibernating without swap but with zram possible?

gissf1 commented 1 year ago

Hello,

That is the goal of the project. Using zram for normal operation (without using a swap device), but enabling a swap device when needed for hibernation. Hibernation always saves to a block device, so it does require some storage dedicated to it.

If you're looking for something that does not save to disk at all, perhaps you're looking for "standby" or "suspend to ram" functionality, which would require your system to provide necessary hardware to support that. Most x86 systems can do this, even desktops, but they require power to be maintained, either by being plugged in or with some type of battery.

One issue I ran across in my own testing was that some desktop environments remove the hibernate option when only zram is active. This makes it more difficult to trigger hibernation, requiring a command line call to systemctl or something like that.

Let me ask a few questions about your setup there:

Specifically those egrep errors are unusual, but they look rather harmless. I noticed now that the egrep message is now part of the latest system update, and tries to promote usage of grep -E instead of egrep. While I didn't yet analyze the code in detail to be sure it has no ill effects, I can confirm that egrep does still call grep -E after throwing that warning. I'll release a code update to address this.

I think the real issue you're encountering is the KERNEL_RESUME_DEVICE not being detected. Do you have any partitions set up as type "Linux Swap"? Having a swap partition is a kernel requirement for a successful resume. If so, could you explain your configuration a bit: partition type (MBR vs GPT), partition locations, types and sizes, etc.

Let me know on that info and I will try to help you further diagnose the cause and resolve it.

EDIT: corrected some typos and added comments regarding standby, egrep and partition descriptions.

JustSimplyKyle commented 1 year ago

System Information: image The only "swap" device I had is the zram0 block devices(setted by zramd) image My main is on nvme1n1p2 using btrfs and GPT /var /home and / are all separated subvolumes.

gissf1 commented 1 year ago

Just to be sure on a few other details, could you also get me the output of the following: cat /proc/cmdline and lsblk -f

I'm looking to see what type each of your partitions is, and what the resume= kernel parameter is set to, as those will affect how the script detects KERNEL_RESUME_DEVICE.

You will need at least 1 disk-based swap device to store the hibernation data while your system is powered down. This is typically an MBR partition of type of 82, or a GPT partition of type 0657FD6D-A4AB-43C4-84E5-0933C84B4F4F.

You will also need the resume= kernel argument to specify where the kernel should resume from after hibernation. This should probably be a dedicated swap partition, if possible. I would also suggest this swap partition being on a spinning disk, and not on Flash/SSD/NVME.

I'm also taking your suggestion to expand on documentation. I'll be making an update soon with more documentation on how to use zram-hibernate.

JustSimplyKyle commented 1 year ago

Ohh, I get why this isn't working! So, if my understanding is correct, you still need to have a swap partition, correct? I just don't "mount" the device?

JustSimplyKyle commented 1 year ago

I've got it to "hibernate with zram" but in a lotta quotation marks I need to enable both zram and swapfile image It functionally should be fine(? since the zram priority is much higher than the swapfile, although I got new errors with this setup. My new setup: zram -- by zramd swapfiles

cat /proc/cmdline intel_pstate=passive initrd=\amd-ucode.img initrd=\initramfs-linux519-tkg-pds.img root="LABEL=Arch" rw amd_iommu=on iommu=pt iommu=1 vfio-pci.ids=1002:67ef,1002:aae0 amdgpu.ppfeaturemask=0xffffffff zswap.enabled=0 rootflags=subvol=@ amd_pstate.shared_mem=1 rootfstype=btrfs resume=UUID=75fb96c6-d43b-4329-9275-68614bb6c9f0 resume_offset=177739008

lsblk -f

NAME        FSTYPE FSVER LABEL    UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                   
└─sda1      btrfs                 f0ffcee3-c4e0-4103-83ea-e5ecfccb7921      2T    26% /run/media/kyle/f0ffcee3-c4e0-4103-83ea-e5ecfccb7921
sdb                                                                                   
├─sdb1                                                                                
├─sdb2                                                                                
└─sdb3      ntfs         遊戲安裝 F624F91C24F8E113                                    
sdc                                                                                   
├─sdc1                                                                                
└─sdc2      ntfs         本機磁碟 D8DA8767DA8740AC                                    
zram0                                                                                 [SWAP]
nvme0n1                                                                               
├─nvme0n1p1                                                                           
└─nvme0n1p2 ntfs                  B0101ECB101E9884                                    
nvme1n1     btrfs                 f41f9790-1e85-4837-bc2e-e2f5b4f75741                
├─nvme1n1p1 vfat   FAT16          C8EB-0675                             165.7M    45% /boot
└─nvme1n1p2 btrfs        Arch     75fb96c6-d43b-4329-9275-68614bb6c9f0                /var/lib/docker/btrfs
                                                                                      /var
                                                                                      /swap
                                                                                      /home
                                                                                      /
gissf1 commented 1 year ago

Correct, the way it works is to have the swap partition present on the system, but not "mounting" it as you called it. I use the term "active", since swap isn't actually "mounted" the same way like a typical filesystem, but I think you got what I meant.

Your new configuration with swap active should functionally work, but you probably want zram to be lower priority since the kernel adds to the lowest priority swap device first. This means it would currently need to fill your swap subvolume (priority -2) before touching zram (priority 100). Therefore you probably want to adjust those priorities.

As point of caution, I have never used btrfs subvolumes with swap before, and I don't know the safety of the resume_offset parameter in that use case. Btrfs may decide to move the data in a way that causes data corruption on your btrfs volume, even possibly affecting other subvolumes on that partition.

What you're using now seems to be a swap file (swapfile) on a btrfs volume (mounted at /swap/?), which is in turn on a partition (/dev/nvme1n1p2) of an NVME device (/dev/nvme1n1). I was suggesting that you use a swap partition, which is a slightly different setup. Swap files are slightly slower and less safe than a dedicated swap partition, and especially so with btrfs. I believe kernel developers just recently corrected more deadlock and corruption situations with swap on btrfs. That said, feel free to try it and report back!

Generally swap partitions are created with fdisk (similar partitioning tools like gparted might be easier to work with if you have a GUI) to set the partition type to "Linux Swap", then mkswap /dev/partition to format the partition instead of 'mkbtrfs'. The newly formatted partition can be manually activated with swapon -p 999 /dev/partition to test it if you chose. The -p 999 sets priority to 999 until swapoff. Be aware that calling mkswap will wipe all data on the specified partition/device.

Thanks for pointing out the SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1 option! I'll check to see if this fixes the issues where hibernation wasn't listed in the desktop environments, but systemctl hibernate has always worked for me without having that set.

JustSimplyKyle commented 1 year ago

The thing is, resizing partitions is a pain in btrfs, so I instead use the swapfile for convenience. The btrfs concerns that you raised, are mostly solved in kernel version above 5.0. but you still have to use a special way of calculating the offest(mentioned in the arch wiki Also that priority thing is werid, so the lower the number is, it priorities it first?

JustSimplyKyle commented 1 year ago

This is with the new script

❯ doas /usr/lib/systemd/system-sleep/zram-hibernate     
[38040] creating tmpfile...
[38040] lock acquired.
ensureDiskSwap(): Initial swap status...
[19:24:52] Used=4933M MemFree=21G SwapUsed=0K/31G Overcommit=0K
    nvme1n1p2:  0.0% (0K/931G) []
        zram0:  0.0% (0K/31G) [10]
     swapfile:  0.0% (0K/931G) []
Unknown device type: /dev/nvme1n1p2: 259:5 blkext
Unknown device type: /swap/swapfile: 40
Unknown device type: /dev/nvme1n1p2: 259:5 blkext
SWAP 0: /dev/nvme1n1p2: is not a disk
FATAL: no valid swaps found. cannot suspend to disk.
gissf1 commented 1 year ago

Traditionally, partitioning is done on the raw block device itself, not within a filesystem, but I know that btrfs, just like zfs, has its own volume management functionality, so that whole concept is rather blurred now.

I added support for NVME devices per your output above, which will help solve part of your issue. Thus far, I've only used this tool with spinning disks and dedicated swap partitions. I'll add a disclaimer that hibernating to an NVME volume could also shorten the life of your NVME device. Just so you're aware of the risk and take full responsibility for that.

That said, it seems you're rather dedicated to making this work, so I doubt that will discourage you.

Regarding priority order, I just tested it to confirm, and it seems I remembered that incorrectly. The highest number is the first one to be filled, and therefore is the higher priority. You can test this by having 2 swap files/devices at different priorities and running some program that allocates large amounts of memory and see which swap gets data first in /proc/swaps.

I should also mention that the current code prevents manual hibernation without editing it to set TEST=0 instead of TEST=1 near the bottom of the file. This was a safety I had in place for testing on my own system.

Upon further testing and code analysis, the code is not currently able to handle swap files. The parsing logic is already rather complicated to auto-detect existing swap options (mounted and not) and correlate them with the kernel's resume= parameter, but correlating a resume= device against a swap filename and being able to map that safely is far from trivial. I'm trying to think of a way to offer you a short-term workaround, but I'm not sure how to best do that yet.

In short, I don't think swap file support is something I can add in a short timeframe. I would accept a pull request if you would like to help develop this feature, of course. And if you want to do this, I can try to explain a bit of the code if needed.

The commit I just made has the following changes:

Let me know if that helps your situation at all.

JustSimplyKyle commented 1 year ago
~ via ☕ v18.0.2 via  v16.16.0 
❯ doas /usr/lib/systemd/system-sleep/zram-hibernate          
[803856] creating tmpfile...
[803856] lock acquired.
ensureDiskSwap(): Initial swap status...
[22:10:43] Used=8375M MemFree=16G SwapUsed=0K/82G Overcommit=0K
    nvme1n1p2:  0.0% (0K/931G) []
     swapfile:  0.0% (0K/50G) [1]
        zram0:  0.0% (0K/31G) [10]
Unknown device type: /swap/swapfile: 40
ensureDiskSwap(): selecting SWAP 0 /dev/nvme1n1p2 [] 931G
ensureDiskSwap(): Storing current SWAP state...
ensureDiskSwap(): Activating SWAP 0: /dev/nvme1n1p2 [11]
ensureDiskSwap(): executing: swapon -p '11' '/dev/nvme1n1p2'
swapon: /dev/nvme1n1p2: read swap header failed

This is with the newest script + my zram_swapfile both active setup, It's seems to be getting pretty close to working, but it doesn't get where's the swapfile(as you've stated) the correct swapon command should be swapon -p '11' '/swap/swapfile' This however is what happens if I only got have zram.

❯ doas /usr/lib/systemd/system-sleep/zram-hibernate 
[919174] creating tmpfile...
[919174] lock acquired.
ensureDiskSwap(): Initial swap status...
[22:15:48] Used=8348M MemFree=16G SwapUsed=0K/31G Overcommit=0K
    nvme1n1p2:  0.0% (0K/931G) []
        zram0:  0.0% (0K/31G) [10]
ensureDiskSwap(): Warning: SWAP 0 /dev/nvme1n1p2 has zero length OPTIONS value
FATAL: no valid swaps found. cannot suspend to disk.

I hope that there could be somewhat an override to SWAP_DEVICE(or KERNEL_SWAP_DEVICE), cause it's not quite easy to where the swapfile is just by looking from the /proc/cmdline

gissf1 commented 1 year ago

Right, and that's exactly the complication: When there is only zram loaded (the typical use case), the code needs to know what swap to mount for hibernation. Currently it uses whatever is in the resume= kernel command line to know what swap to activate for hibernation, but I have no idea how to use a resume device + offset to look up a filename/path to the swap file that it would use to save hibernation data.

I could add a command line argument for manual hibernation, but the goal was to seamlessly integrate into a normal system configuration, which would mean the configuration has to be automatically detected or stored in a fixed configuration file somehow.

After that explanation, I had an epiphany and found a quick solution! Basically I'm adding support for an optional configuration file that overrides the kernel command line's resume= for the hibernation target. If the user doesn't setup their kernel correctly though, it would fail to resume.

Try the latest code with the configuration file and let me know what you think. Also, since I've been adding a ton of documentation, you should probably check out the README.md and let me know if you notice something lacking in the explanations there as well.

I also added some basic sanity checks for swap files, just to be sure that the user is warned about some unusual cases that could cause issues. As we haven't successfully tested btrfs yet, and I still have some reservations as to it's reliability, it is "allowed" but with a warning and a safety delay for now. If our testing comes back without issue, I can remove the delay in the next update.

JustSimplyKyle commented 1 year ago

uh....

[506684] creating tmpfile...
[506684] lock acquired.
egrep: warning: egrep is obsolescent; using grep -E
ensureDiskSwap(): Initial swap status...
[21:09:35] Used=7718M MemFree=1535M SwapUsed=30M/31G Overcommit=0K
awk: cmd. line:1: fatal: division by zero attempted
     swapfile: % (0K/0K) []
        zram0:  0.1% (30M/31G) [-2]
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2[/@swap]
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device type: /dev/nvme1n1p2[/@swap]
stat: cannot statx '/dev/nvme1n1p2[/@swap]': No such file or directory
Unknown device type: /dev/nvme1n1p2[/@swap]:

the config is just KERNEL_SWAP_DEVICE=/swap/swapfile

JustSimplyKyle commented 1 year ago

btw, I don't think you can access /dev/nvme1n1p2[/@swap] like this I think you need to mount /dev/nvme1n1p2 with optionsubvol=@swap and then you get the stat by that

gissf1 commented 1 year ago

Firstly, I'm surprised I didn't see that egrep warning before, but I fixed that now.

The division by zero error is only for the display code, so it shouldn't have any effect other than the swap file output not completely working as expected. I added a workaround for this now.

I don't have any systems running btrfs at the moment, so I couldn't completely test all cases of the code. It seems the utility I used for getting the underlying device name from the swapfile also (unexpectedly) included the subvol as part of that device string in brackets like that. So you're right in that I can't use that syntax to verify the device, but it wasn't actually something I intended for it to do either.

I've now added code to strip off the subvol info if it is present in the device string, so it should work better.

JustSimplyKyle commented 1 year ago

why is the swapfile still size 0K?

❯ doas /usr/lib/systemd/system-sleep/zram-hibernate   
[379403] creating tmpfile...
[379403] lock acquired.
ensureDiskSwap(): Initial swap status...
[18:51:23] Used=5283M MemFree=9098M SwapUsed=3072K/31G Overcommit=0K
     swapfile:  0.0% (0K/0K) []
        zram0:  0.0% (3072K/31G) [10]
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
/usr/lib/systemd/system-sleep/zram-hibernate: line 882: [: : integer expression expected
SWAP 0: /swap/swapfile: is inactive and too small
ensureDiskSwap(): Warning: SWAP 0 /swap/swapfile has zero length OPTIONS value
FATAL: no valid swaps found. cannot suspend to disk.
gissf1 commented 1 year ago

I was able to reproduce this. I tested the previous code with the swap file already active, but when it is not active, it comes up with 0K size as you saw. The preexisting code which gathers the partition size when a swap is not mounted needed to be updated to handle swap files.

The effects of the initial error is basically what caused the secondary "integer expression expected" error, because it had empty strings instead of a number as it was expecting.

Both should be fixed now.

gissf1 commented 1 year ago

So did those changes get you up and running?

JustSimplyKyle commented 1 year ago

Nope, I was away from my computer yesterday.

❯ doas /usr/lib/systemd/system-sleep/zram-hibernate -v
ensureDiskSwap(): Initial swap status...
[07:54:15] Used=10155M MemFree=19G SwapUsed=2553M/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  8.0% (2553M/31G) [10]
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): Warning: SWAP 0 /swap/swapfile has zero length OPTIONS value
FATAL: no valid swaps found. cannot suspend to disk.

Also this is how I created my swapfile

touch /swap/swapfile
chattr +C /swap/swapfile  ## Needed to disable Copy On Write on the file.
fallocate --length 50GiB /swap/swapfile 
chmod 600 /swap/swapfile 
mkswap /swap/swapfile 
gissf1 commented 1 year ago

Oh, an OPTIONS failure. Create an entry in your /etc/fstab of type swap, referencing your swap file path. You can use the noauto mount option to prevent the system from automatically mounting it at bootup. This should probably be after the parent mount points for your filesystem containing the swap file.

It should look something like this:

/swap/swapfile  none   swap  noauto,defaults  0  0

Regarding swapfile creation, I've typically used dd myself, but fallocate should be fine as long as it's reserving preallocated space (so it's not creating a sparse file or one with holes). It looks like it was created correctly. You can verify this by running swapon /swap/swapfile and then verifying that cat /proc/swaps lists it. zram-hibernate doesn't require any special creation of the swap space; it only requires that the swap is usable by the kernel.

Good call on disabling copy-on-write and setting the file mode 0600. Both of those are essential for swap files to work on btrfs.

For initial testing, you may want to run it with the -t flag as well to prevent actual hibernation (physical poweroff via systemctl hibernate) in case something goes wrong when activating/deactivating swaps the first time through. It will still do all the swap manipulation for real, but without the risk of failing to resume and missing the error messages.

JustSimplyKyle commented 1 year ago

I'm so close to getting it to work. The issue is... it doesn't hibernate?

❯ doas /usr/lib/systemd/system-sleep/zram-hibernate 
[6952] creating tmpfile...
[6952] lock acquired.
ensureDiskSwap(): Initial swap status...
[22:13:45] Used=6773M MemFree=12G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [10]
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): selecting SWAP 0 /swap/swapfile [] 51G
ensureDiskSwap(): Storing current SWAP state...
ensureDiskSwap(): Activating SWAP 0: /swap/swapfile [11]
ensureDiskSwap(): executing: swapon -p '11' '/swap/swapfile'
ensureDiskSwap(): Preparing to remove SWAP entries...
ensureDiskSwap(): executing: doSwapRemoval  /dev/zram0
[22:13:55] Used=6773M MemFree=12G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [10]
doSwapRemoval(): Attempting to add/remove swaps...
doSwapRemoval(): swapoff /dev/zram0 completed [0]
[22:13:56] Used=4512M MemFree=15G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
ensureDiskSwap(): final status:
[22:13:56] Used=4519M MemFree=15G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
doHibernate(): executing: /bin/systemctl hibernate
doHibernate(): back from: /bin/systemctl hibernate (exit code=0)
restoreSwapState(): Initial swap status...
[22:13:56] Used=4499M MemFree=15G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
restoreSwapState(): Reusing existing SWAP state description...
restoreSwapState(): Processing SWAP state description...
restoreSwapState(): Activating original SWAP devices...
restoreSwapState(): Removing hibernation SWAP entries...
restoreSwapState(): executing: doSwapRemoval /swap/swapfile
[22:13:56] Used=4499M MemFree=15G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
doSwapRemoval(): Attempting to add/remove swaps...
FATAL: Unable to determine SwapTotal
doSwapRemoval(): swapoff /swap/swapfile completed [0]
FATAL: Unable to determine SwapTotal
[22:13:57] Used=4459M MemFree=15G SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []
FATAL: Unable to determine SwapTotal
restoreSwapState(): final status:
[22:13:57] Used=4463M MemFree=15G SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []
gissf1 commented 1 year ago

I'm honestly a bit concerned about those FATAL: errors post-resume, but you're saying it's not even actually hibernating?

To get more data on the FATAL errors, try adding a -vvv for 3x verbose output details that might provide some more insight. Also, verify that you can cat /proc/meminfo and cat /proc/swaps at the prompt, as failing to access those information sources would be the main causes of the reported errors. It could also be a failure to access the tempfile/lockfile at /tmp/zram-hibernate.tmp, so maybe check that is writable and accessible. I'm not sure why those files would fail only after sysctl attempts to hibernate though, so perhaps the issues are related.

As for systemctl hibernate failing to actually hibernate your system, there are a few system logs we should check:

My guess is that one of the safety checks in the kernel is "rejecting" the hibernation request, but I'm not sure how that would cause the FATAL errors.

JustSimplyKyle commented 1 year ago

Yep, it's not hibernating. The actual command

❯ doas /usr/lib/systemd/system-sleep/zram-hibernate  -vvv
[20268] removing stale lock...
[20268] creating tmpfile...
[20268] lock acquired.
ensureDiskSwap(): Initial swap status...
[06:52:54] Used=5812M MemFree=17G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [10]
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): REMOVE_SWAP_SIZE=0
ensureDiskSwap(): FULL SIZE_NEEDED=5951564
ensureDiskSwap(): HALF SIZE_NEEDED=2975782
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): selecting SWAP 0 /swap/swapfile [] 51G
ADD_SWAP_IDX_LIST= 0
ensureDiskSwap(): Storing current SWAP state...
RESTORE_SWAPS="10:/dev/zram0"
ensureDiskSwap(): Activating SWAP 0: /swap/swapfile [11]
ensureDiskSwap(): executing: swapon -p '11' '/swap/swapfile'
ensureDiskSwap(): Preparing to remove SWAP entries...
ensureDiskSwap(): executing: doSwapRemoval  /dev/zram0
[06:53:04] Used=5812M MemFree=17G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [10]
doSwapRemoval(): Attempting to add/remove swaps...
doSwapRemoval(): swapoff /dev/zram0 completed [0]
[06:53:05] Used=5831M MemFree=17G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
ensureDiskSwap(): final status:
[06:53:05] Used=5811M MemFree=17G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
doHibernate(): executing: /bin/systemctl hibernate
doHibernate(): back from: /bin/systemctl hibernate (exit code=0)
restoreSwapState(): Initial swap status...
[06:53:05] Used=5816M MemFree=17G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
restoreSwapState(): Reusing existing SWAP state description...
RESTORE_SWAPS=10:/dev/zram0
restoreSwapState(): Processing SWAP state description...
restoreSwapState(): parsed PAIR: /dev/zram0 [10]
ADD_SWAPS=
REMOVE_SWAPS=/swap/swapfile
restoreSwapState(): Activating original SWAP devices...
restoreSwapState(): Removing hibernation SWAP entries...
restoreSwapState(): executing: doSwapRemoval /swap/swapfile
[06:53:05] Used=5816M MemFree=17G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
doSwapRemoval(): Attempting to add/remove swaps...
FATAL: Unable to determine SwapTotal
doSwapRemoval(): swapoff /swap/swapfile completed [0]
FATAL: Unable to determine SwapTotal
[06:53:05] Used=5772M MemFree=17G SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []
FATAL: Unable to determine SwapTotal
restoreSwapState(): final status:
[06:53:05] Used=5755M MemFree=17G SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []

The journal log when I ran the command(journalctl -f)

Oct 10 06:52:54 arch doas[20268]: kyle ran command /usr/lib/systemd/system-sleep/zram-hibernate -vvv as root from /home/kyle
Oct 10 06:52:54 arch kernel: audit: type=1105 audit(1665355974.181:339): pid=20265 uid=1000 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'
Oct 10 06:53:04 arch kernel: Adding 53477372k swap on /swap/swapfile.  Priority:11 extents:4 across:107216896k FS
Oct 10 06:53:04 arch systemd[1]: dev-zram0.swap: Deactivated successfully.
Oct 10 06:53:05 arch systemd[1]: swap-swapfile.swap: Deactivated successfully.
Oct 10 06:53:05 arch doas[20265]: pam_unix(doas:session): session closed for user root
Oct 10 06:53:05 arch audit[20265]: USER_END pid=20265 uid=1000 auid=1000 ses=2 msg='op=PAM:session_close grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'
Oct 10 06:53:05 arch audit[20265]: CRED_DISP pid=20265 uid=1000 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'
Oct 10 06:53:05 arch kernel: audit: type=1106 audit(1665355985.748:340): pid=20265 uid=1000 auid=1000 ses=2 msg='op=PAM:session_close grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'
Oct 10 06:53:05 arch kernel: audit: type=1104 audit(1665355985.748:341): pid=20265 uid=1000 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'

The dmesg log(dmesg -W)

❯ doas dmesg -W
[30421.797918] audit: type=1101 audit(1665356290.555:377): pid=152568 uid=1000 auid=1000 ses=2 msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time acct="kyle" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'
[30421.797934] audit: type=1110 audit(1665356290.555:378): pid=152568 uid=1000 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'
[30421.798660] audit: type=1105 audit(1665356290.556:379): pid=152568 uid=1000 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'
[30432.310618] Adding 53477372k swap on /swap/swapfile.  Priority:11 extents:4 across:107216896k FS
[30433.345137] audit: type=1106 audit(1665356302.101:380): pid=152568 uid=1000 auid=1000 ses=2 msg='op=PAM:session_close grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'
[30433.345183] audit: type=1104 audit(1665356302.101:381): pid=152568 uid=1000 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root" exe="/usr/bin/doas" hostname=arch addr=? terminal=pts/1 res=success'

I can access /proc/meminfo and /proc/swaps correctly. The /tmp/zram-hibernate.tmp https://0x0.st/ot2y.tmp The FATAL problems seems like it's because it failed to reactivate zram? I'm not sure how does this script works tho

JustSimplyKyle commented 1 year ago

Although, if I run these commands, it actually works

swapoff /dev/zram0
swapon /swap/swapfile -p 11
systemctl hibernate
swapoff /swap/swapfile 
swapon /dev/zram0
gissf1 commented 1 year ago

OK, one more thing to try. You're running it manually after it's been installed into systemd's hook location. I wonder if it's being called recursively amid the internal systemctl call, and aborting because of a failure to acquire the lock at that point. That could explain the FATAL errors and/or the lack of actual hibernation.

Try moving the file from /usr/lib/systemd/system-sleep/zram-hibernate to some other location. (maybe /home/kyle somewhere). After moving it out of the way, try running it from the new location just like you did above with -vvv.

If that works, then move it back where it was and try just doing systemctl hibernate with zram as your only active swap, as this is how the script is supposed to be used in a "normal" scenario.

JustSimplyKyle commented 1 year ago

Well, I've tried that but it still doesn't hibernate

JustSimplyKyle commented 1 year ago
❯ ~/things/zram-hibernate -vvv  
[sudo] password for kyle: 
[1311835] creating tmpfile...
[1311835] lock acquired.
ensureDiskSwap(): Initial swap status...
[17:08:06] Used=6479M MemFree=7950M SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [10]
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): REMOVE_SWAP_SIZE=0
ensureDiskSwap(): FULL SIZE_NEEDED=6635016
ensureDiskSwap(): HALF SIZE_NEEDED=3317508
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): selecting SWAP 0 /swap/swapfile [] 51G
ADD_SWAP_IDX_LIST= 0
ensureDiskSwap(): Storing current SWAP state...
RESTORE_SWAPS="10:/dev/zram0"
ensureDiskSwap(): Activating SWAP 0: /swap/swapfile [11]
ensureDiskSwap(): executing: swapon -p '11' '/swap/swapfile'
ensureDiskSwap(): Preparing to remove SWAP entries...
ensureDiskSwap(): executing: doSwapRemoval  /dev/zram0
[17:08:16] Used=6479M MemFree=7950M SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [10]
doSwapRemoval(): Attempting to add/remove swaps...
doSwapRemoval(): swapoff /dev/zram0 completed [0]
[17:08:17] Used=6445M MemFree=7983M SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
ensureDiskSwap(): final status:
[17:08:17] Used=6448M MemFree=7980M SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
doHibernate(): executing: /usr/bin/systemctl hibernate
doHibernate(): back from: /usr/bin/systemctl hibernate (exit code=0)
restoreSwapState(): Initial swap status...
[17:08:17] Used=6450M MemFree=7977M SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
restoreSwapState(): Reusing existing SWAP state description...
RESTORE_SWAPS=10:/dev/zram0
restoreSwapState(): Processing SWAP state description...
restoreSwapState(): parsed PAIR: /dev/zram0 [10]
ADD_SWAPS=
REMOVE_SWAPS=/swap/swapfile
restoreSwapState(): Activating original SWAP devices...
restoreSwapState(): Removing hibernation SWAP entries...
restoreSwapState(): executing: doSwapRemoval /swap/swapfile
[17:08:17] Used=6450M MemFree=7977M SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
doSwapRemoval(): Attempting to add/remove swaps...
FATAL: Unable to determine SwapTotal
doSwapRemoval(): swapoff /swap/swapfile completed [0]
FATAL: Unable to determine SwapTotal
[17:08:17] Used=6410M MemFree=8017M SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []
FATAL: Unable to determine SwapTotal
restoreSwapState(): final status:
[17:08:17] Used=6414M MemFree=8008M SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []
JustSimplyKyle commented 1 year ago

Also btw, after running this script, my zram0 got disabled

JustSimplyKyle commented 1 year ago

image Also, if this is uncommented(in doHibernate), would it actually run systemctl hibernate?

gissf1 commented 1 year ago

Oh, yeah, that commented line would probably be the problem! Sorry. Try uncommenting it and I bet it hibernates now!

That still doesn't explain zram0 being disabled nor the "FATAL" issues. Somehow the script is not able to retrieve data that should be there and it basically aborts alot of things, including restoring the zram. I'll look into this issue more tomorrow.

JustSimplyKyle commented 1 year ago

sadly, it still somehow doesn't work

[3290609] removing stale lock...
[3290609] creating tmpfile...
[3290609] lock acquired.
ensureDiskSwap(): Initial swap status...
[21:16:47] Used=7944M MemFree=19G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [1]
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): REMOVE_SWAP_SIZE=0
ensureDiskSwap(): FULL SIZE_NEEDED=8134744
ensureDiskSwap(): HALF SIZE_NEEDED=4067372
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): selecting SWAP 0 /swap/swapfile [] 51G
ADD_SWAP_IDX_LIST= 0
ensureDiskSwap(): Storing current SWAP state...
RESTORE_SWAPS="1:/dev/zram0"
ensureDiskSwap(): Activating SWAP 0: /swap/swapfile [2]
ensureDiskSwap(): executing: swapon -p '2' '/swap/swapfile'
ensureDiskSwap(): Preparing to remove SWAP entries...
ensureDiskSwap(): executing: doSwapRemoval  /dev/zram0
[21:16:58] Used=7944M MemFree=19G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [1]
doSwapRemoval(): Attempting to add/remove swaps...
doSwapRemoval(): swapoff /dev/zram0 completed [0]
[21:16:58] Used=7911M MemFree=19G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [2]
ensureDiskSwap(): final status:
[21:16:58] Used=7915M MemFree=19G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [2]

doHibernate(): executing: /usr/bin/systemctl hibernate
Filename                Type        Size        Used        Priority
/swap/swapfile                          file        53477372    0       2
doHibernate(): back from: /usr/bin/systemctl hibernate (exit code=0)
restoreSwapState(): Initial swap status...
[21:17:01] Used=7971M MemFree=19G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [2]
restoreSwapState(): Reusing existing SWAP state description...
RESTORE_SWAPS=1:/dev/zram0
restoreSwapState(): Processing SWAP state description...
restoreSwapState(): parsed PAIR: /dev/zram0 [1]
ADD_SWAPS=
REMOVE_SWAPS=/swap/swapfile
restoreSwapState(): Activating original SWAP devices...
restoreSwapState(): Removing hibernation SWAP entries...
restoreSwapState(): executing: doSwapRemoval /swap/swapfile
[21:17:01] Used=7971M MemFree=19G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [2]
doSwapRemoval(): Attempting to add/remove swaps...
FATAL: Unable to determine SwapTotal
doSwapRemoval(): swapoff /swap/swapfile completed [0]
FATAL: Unable to determine SwapTotal
[21:17:01] Used=7911M MemFree=19G SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []
FATAL: Unable to determine SwapTotal
restoreSwapState(): final status:
[21:17:02] Used=7911M MemFree=19G SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []

I even added the cat /proc/swaps before the hibernate

gissf1 commented 1 year ago

I think I fixed that FATAL error with the latest push.

Also, in reading some documentation on systemctl hibernate, I may have figured out why your hibernation is not working. Evidently that call is asynchronous, so it returns before the system actually hibernates. Likely it ends up being a race condition where the swap is unmounted and zram restored before the system is able to freeze and hibernate processes. This would cause hibernation to abort long after the script finishes running. I'm working on a test and possible fix for this, but it probably won't be done for a few days.

JustSimplyKyle commented 1 year ago
❯ /usr/lib/systemd/system-sleep/zram-hibernate -vvv        
[891378] creating tmpfile...
[891378] lock acquired.
ensureDiskSwap(): Initial swap status...
[21:39:30] Used=7521M MemFree=13G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [1]
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): REMOVE_SWAP_SIZE=0
ensureDiskSwap(): FULL SIZE_NEEDED=7702384
ensureDiskSwap(): HALF SIZE_NEEDED=3851192
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): selecting SWAP 0 /swap/swapfile [] 51G
ADD_SWAP_IDX_LIST= 0
ensureDiskSwap(): Storing current SWAP state...
RESTORE_SWAPS="1:/dev/zram0"
ensureDiskSwap(): Activating SWAP 0: /swap/swapfile [2]
ensureDiskSwap(): executing: swapon -p '2' '/swap/swapfile'
ensureDiskSwap(): Preparing to remove SWAP entries...
ensureDiskSwap(): executing: doSwapRemoval  /dev/zram0
[21:39:40] Used=7521M MemFree=13G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [1]
doSwapRemoval(): Attempting to add/remove swaps...
doSwapRemoval(): swapoff /dev/zram0 completed [0]
[21:39:40] Used=7506M MemFree=13G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [2]
ensureDiskSwap(): final status:
[21:39:41] Used=7514M MemFree=13G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [2]
doHibernate(): executing: /usr/bin/systemctl hibernate
doHibernate(): back from: /usr/bin/systemctl hibernate (exit code=0)
restoreSwapState(): Initial swap status...
[21:39:41] Used=7463M MemFree=13G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [2]
restoreSwapState(): Reusing existing SWAP state description...
RESTORE_SWAPS=1:/dev/zram0
restoreSwapState(): Processing SWAP state description...
restoreSwapState(): parsed PAIR: /dev/zram0 [1]
ADD_SWAPS=
REMOVE_SWAPS=/swap/swapfile
restoreSwapState(): Activating original SWAP devices...
restoreSwapState(): Removing hibernation SWAP entries...
restoreSwapState(): executing: doSwapRemoval /swap/swapfile
[21:39:41] Used=7463M MemFree=13G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [2]
doSwapRemoval(): Attempting to add/remove swaps...
doSwapRemoval(): swapoff /swap/swapfile completed [0]
[21:39:41] Used=7433M MemFree=13G SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []
restoreSwapState(): final status:
[21:39:41] Used=7439M MemFree=13G SwapUsed=0K/0K Overcommit=0K
     swapfile:  0.0% (0K/51G) []

Yeah, the fatal errors are fixed But still, it doesn't enable /dev/zram0

JustSimplyKyle commented 1 year ago

Yep, I can confirm that it's a race condition! Currently just throwing a sleep 10 after the systemctl hibernate call does in fact make it work

JustSimplyKyle commented 1 year ago

I found a VERY interesting solution. image The watch command essentially pauses execution until the output of sytemctl is-active systemd-hibernate The output would change when it actually starts to be active(the initial output is inactive)

JustSimplyKyle commented 1 year ago

Still, zram0 isn't started back for some reason

gissf1 commented 1 year ago

Thanks for confirmation on the race condition, and offering a solution. I implemented a check based on to your suggested solution above, but with logic to (hopefully) ensure that code execution doesn't proceed until hibernation is complete (and resume is likely started).

I didn't fix the "zram missing on resume" issue yet, but I'll hopefully get to investigating a bit more this week. ADD_SWAPS should show /dev/zram0 and that would cause it to restore it, but instead it's showing as blank in your output. I'm thinking that somehow the zram device isn't being detected as a swap device after resume, so it ignores that entry and proceeds. That would be a logic error in the resume code if that is the case, but I'm limited on what I can diagnose right now.

JustSimplyKyle commented 1 year ago

With the newest commit, it seems like it's... stuck?? very weird

❯ doas /usr/lib/systemd/system-sleep/zram-hibernate   -vvv
[291102] creating tmpfile...
[291102] lock acquired.
ensureDiskSwap(): Initial swap status...
[07:13:24] Used=7079M MemFree=13G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [10]
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): REMOVE_SWAP_SIZE=0
ensureDiskSwap(): FULL SIZE_NEEDED=7249396
ensureDiskSwap(): HALF SIZE_NEEDED=3624698
Detected btrfs with subvol: /dev/nvme1n1p2[/@swap]
Block device subvol: @swap
Block device: /dev/nvme1n1p2
WARNING: swap file on btrfs filesystem is not well tested and could be dangerous: /swap/swapfile: on /dev/nvme1n1p2
         Sleeping for 5 seconds... Press Ctrl+C to abort.
Testing underlying block device for: /swap/swapfile: /dev/nvme1n1p2
ensureDiskSwap(): selecting SWAP 0 /swap/swapfile [] 51G
ADD_SWAP_IDX_LIST= 0
ensureDiskSwap(): Storing current SWAP state...
RESTORE_SWAPS="10:/dev/zram0"
ensureDiskSwap(): Activating SWAP 0: /swap/swapfile [11]
ensureDiskSwap(): executing: swapon -p '11' '/swap/swapfile'
ensureDiskSwap(): Preparing to remove SWAP entries...
ensureDiskSwap(): executing: doSwapRemoval  /dev/zram0
[07:13:34] Used=7079M MemFree=13G SwapUsed=0K/31G Overcommit=0K
     swapfile:  0.0% (0K/51G) []
        zram0:  0.0% (0K/31G) [10]
doSwapRemoval(): Attempting to add/remove swaps...
doSwapRemoval(): swapoff /dev/zram0 completed [0]
[07:13:34] Used=7126M MemFree=13G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
ensureDiskSwap(): final status:
[07:13:34] Used=7130M MemFree=13G SwapUsed=0K/50G Overcommit=0K
     swapfile:  0.0% (0K/50G) [11]
doHibernate(): executing: /bin/systemctl hibernate
doHibernate(): back from: /bin/systemctl hibernate (exit code=0)
doHibernate(): sleeping until hibernation activates...
gissf1 commented 1 year ago

I noticed that the "inactive" test didn't quite work as expected, but I hadn't been able to get back to it until today. I finally got it setup in a VM, so I can test it more thoroughly.

I fixed that infinite loop, tested it with a swap device, and pushed the changes. I'm a bit more confident in it now.

In looking at my setup, I found that I have zram-generator and systemd-swap packages installed also. Perhaps that's why my system has /dev/zram0 upon resume, since systemd would create it automatically with those packages. I didn't have any /dev/zram0 in my VM until I installed and configured those packages. I may add a dependency on those packages to the PKGBUILD if installing them solves your issue.

Let me know if you need help in configuring zram-generator in particular, but here is the project page for more info: https://github.com/systemd/zram-generator

Please let me know if that helps solve your issue.

JustSimplyKyle commented 1 year ago

I'll try using zram-generator, previously I was using zramd(cause the ease of use)

JustSimplyKyle commented 1 year ago

zramd is indeed the issue! After switching to zram-generator, everything is fixed! Restored fine, didn't get stuck in loop, and zram0 get respawned!

JustSimplyKyle commented 1 year ago

Well, mostly.... image Isn't it possible to do this if zram-hibernate is in /usr/lib/systemd/system-sleep?

JustSimplyKyle commented 1 year ago

Ohhh, you have to rename it to hibernate! I believe you can remove the btrfs warning since everything is working correctly!"

JustSimplyKyle commented 1 year ago

Suddenly, it stopped working again... image

gissf1 commented 1 year ago

Hmm, when you get that error, I'd want to see the output of cat /proc/swaps, and probably also the output from zramctl zram0 and free. That error should only be happening if you have a bunch of stuff in memory/swap and it can't compress it all enough to fit in the available swap for hibernation.

There's also a kernel parameter to adjust the compression level. If there is actually enough space, or you want to push the compression harder, tweaking that parameter may be the next step. Look where it talks about /sys/power/image_size at https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#About_swap_partition/file_size for more details.

I'm surprised that the script name mattered, because I think systemd executes all the scripts in that folder no matter what sleep/hibernate mode was attempted. It only changes the arguments to the script, so the script can act accordingly.

FWIW, I also created an AUR package which installs it into the systemd system-sleep hook. Its page is here: https://aur.archlinux.org/packages/zram-hibernate-git

You can try using the AUR package to see if that works any better for you since it sets permissions and such as part of the install. And you can report feedback on it here or on the project page above since they're related and I am the maintainer for both.

I'm glad that zram-generator worked for you! I will update documentation to mention those packages, and I'll probably update the AUR package to depend on them too.

By the way, thanks for helping me diagnose and debug this! There's never enough time in a day, so it's greatly appreciated that someone's taking the time to help push the project forward when I can't!

JustSimplyKyle commented 1 year ago

image The thing is, I am using the AUR package And yeah, the name doesn't seem to matter

By the way, thanks for helping me diagnose and debug this! There's never enough time in a day, so it's greatly appreciated that someone's taking the time to help push the project forward when I can't!

It's my pleasure!

gissf1 commented 1 year ago

I was able to reproduce a similar issue in my VM which offered me the same error, and I was able to implement a partial solution.

According to this systemd ticket ( https://github.com/systemd/systemd/issues/15354 ), the issue is caused by systemd's hibernation checks. So from there, I found ( https://forum.manjaro.org/t/howto-enable-and-configure-hibernation-with-btrfs/51253 ) which shows that there is a way to make the change in a separate file, rather than editing core system files.

Based on that bit of information, I modified this package to include a template version of that file, and I modified the PKGBUILD to install them appropriately.

From there, I had to reboot to see any changes. After the reboot, when I attempt to systemctl hibernate I no longer see the error, but I also don't see the system hibernating or anything else happening either.

I'm trying to debug a bit more from here. I may try writing to a virtual serial port to see the script status and output. Any ideas or suggestions are welcome.

JustSimplyKyle commented 1 year ago

Screenshot_20221122-201017 Yeah, I've tried that and it still doesn't work I guess it's yet another race condition? Since we have disabled the hibernation check, the hibernation process maybe simply doesn't wait for the pre-hook to complete.

gustavtemp commented 1 year ago

I try to run the script in bash (version 5.2.15) on Fedora 37. I guess it might be some basic thing I'm doing wrong. :)

ensureDiskSwap(): Initial swap status...
[08:32:59] Used=2675M MemFree=10G SwapUsed=0K/27G Overcommit=0K
./zram-hibernate: line 658: 0.00683594: syntax error: invalid arithmetic operator (error token is ".00683594")

Edit: Moved to #3

gissf1 commented 1 year ago

Screenshot_20221122-201017 Yeah, I've tried that and it still doesn't work I guess it's yet another race condition? Since we have disabled the hibernation check, the hibernation process maybe simply doesn't wait for the pre-hook to complete.

It has to wait for the pre-hook to complete unless something is forking a subprocess. The whole point of that hook is to perform actions before the kernel's low-level hibernation process begins.

It's been a while, so correct me if I'm wrong, but your system was failing to find the swap file for hibernation, right? Is there any chance you could post an (edited) video showing what happens when you try to hibernate and resume? I have a feeling it's something small that we're overlooking.

JustSimplyKyle commented 1 year ago

https://asciinema.org/a/IhcTK6yXBl3pjeMAYv46AnHbz