Nitrokey / heads

A minimal Linux that runs as a coreboot or LinuxBoot ROM payload to provide a secure, flexible boot environment for laptops and servers.
http://osresearch.net/
GNU General Public License v2.0
15 stars 1 forks source link

ns50 v2.3 suspend causes freeze on resume #29

Closed commandline-be closed 10 months ago

commandline-be commented 11 months ago

Please identify some basic details to help process the report

After upgrading to HEADs v2.3 there is an issue with suspend-state to resume. This has the machine seemingly comatose as it does not resume.

A. Provide Hardware Details

1. What board are you using (see list of boards here)?

2. Does your computer have a dGPU or is it iGPU-only?

3. Who installed Heads on this computer?

4. What PGP key is being used?

5. Are you using the PGP key to provide HOTP verification?

B. Identify how the board was flashed

1. Is this problem related to updating heads or flashing it for the first time?

2. If the problem is related to an update, how did you attempt to apply the update?

3. How was Heads initially flashed

4. Was the board flashed with a maximized or non-maximized/legacy rom?

5. If Heads was externally flashed, was IFD unlocked?

C. Identify the rom related to this bug report

1. Did you download or build the rom at issue in this bug report?

2. If you downloaded your rom, where did you get it from?

Please provide the release number or otherwise identify the rom downloaded

3. If you built your rom, which repository:branch did you use?

4. What version of coreboot did you use in building?

5. In building the rom where did you get the blobs?

Please describe the problem

Describe the bug

Machine remains in deep sleep regardless of HID input or pressing the power button

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

the machine resumes from (deep) sleep and/or hibernation

Screenshots If applicable, add screenshots to help explain your problem.

Additional context As documented here this should work well. Test with pm-hibernate and pm-suspend suggest as much. https://docs.dasharo.com/variants/novacustom_ns5x_adl/test-matrix/#module-dasharo-security

Workaround being tried out is modifiying /etc/default/acpi-support so SUSPEND_METHODS="pm-utils" since pm-utils are reported to be working and the past config pointed to dbus first.

commandline-be commented 11 months ago

changes to SUSPEND_METHODS in /etc/default/acpi-support made no difference

daringer commented 11 months ago

are you sure this happens only starting with 2.3 ? Generally the issue is originating in the different sleep/suspend modes and respective OS support:

the Dasharo Test results suggest that suspend should work all the way, but I suppose this is also based on setting either S0 (for ubuntu) and/or S3 for QubesOS, which is weird because as of my knowledge this should can not work...

can you share the output of /sys/power/mem_sleep and sudo dmesg | grep ACPI | grep supports ?

We'll investigate and discuss this with Dasharo, but currently this leaves the impression for me that we might need two different coreboot versions for NV41 and NS50 to maximize the available combinations, which work to suspend...

currently for me the working suspend looks like this (dasharo v1.6 release, qubes 4.1.2, Nitropad release >=2.2)

NV41 NS50
QubesOS S3 none
Ubuntu S3 / (S0 ?) S0 (?)

This might have changed with the most recent Dasharo release 1.7, we'll have to check that

commandline-be commented 11 months ago

of that i'm sure yes. v2.2 had issues with deep sleep, the fan kept spinning even in suspend/hibernate.

for v2.3 the only real change is the command-line boot paramter intel_iommu=on instead of igfx_off

for ns50 i noticed 4 cstates up to C10 I don't know for the nv41

I'll share output later

this situation just begs to ask if no NS50 hardware is made available to test all this ?

commandline-be commented 11 months ago

FAIL fo running this test https://docs.dasharo.com/unified-test-documentation/dasharo-compatibility/31M-platform-suspend-and-resume/#susp001001-platform-suspend-and-resume-ubuntu-2204-wakeup-flag

must comment, set to 20 seconds no 60 as shown on this test page state = freeze

daringer commented 11 months ago

of that i'm sure yes. v2.2 had issues with deep sleep, the fan kept spinning even in suspend/hibernate.

if the fan keeps spinning, this would suggest that the sleep state is not really reached - although it's weird that its behavior changed for 2.3 ...

FAIL fo running this test https://docs.dasharo.com/unified-test-documentation/dasharo-compatibility/31M-platform-suspend-and-resume/#susp001001-platform-suspend-and-resume-ubuntu-2204-wakeup-flag

must comment, set to 20 seconds no 60 as shown on this test page state = freeze

please be aware, that these test results refer to dasharo v1.7.1 which is not integrated into the current firmware. On top these test results are generated with an EDK2 payload and not with HEADS - means sleep states are configurable. Other coreboot/EDK2 configuration details might also easily change the platform behavior here.

for ns50 i noticed 4 cstates up to C10 I don't know for the nv41

So far I understand Cx states have nothing to do with sleep, those are CPU states for low power operation on small loads / idle operation.

commandline-be commented 11 months ago

power.max_cstates don't, setting the value wrong actually influences performance intel_idle.max_cstates most likely does, this value sets a hard limit for the number of available states

in case you wonder what cstates are, this is a nice overview, if anything c-states invoke sleep, i mean what else would ?

https://gist.github.com/wmealing/2dd2b543c4d3cff6cab7

commandline-be commented 11 months ago

potentially interesting note here https://github.com/torvalds/linux/blob/v6.2/drivers/idle/intel_idle.c

/*

daringer commented 11 months ago

in case you wonder what cstates are, this is a nice overview, if anything c-states invoke sleep, i mean what else would ?

The S-states, here it is described in some detail: https://unix.stackexchange.com/questions/550731/difference-between-c-state-and-s-state. Power consumption can change during suspend for varying C-state configurations, but the C-State by itself is just a mechanism for the SoC to consume less power (with the trade-off of taking longer until full performance can be achieved again) based on current system load.

You can also see C-State metrics (Idle Stats) in e.g., powertop

commandline-be commented 11 months ago

not sure what to think of this really, given I'm reporting the issue and have no fix myself

c-states set the actual state for the CPU power consumption, p-states set the actual performance profile (Mhz), s-states also involve other components outside of the CPU to go into a different energy-mode.

best resource i could find https://metebalci.com/blog/a-minimum-complete-tutorial-of-cpu-power-management-c-states-and-p-states/

tlaurion commented 11 months ago

@commandline-be This is where I said some of those things need to be fixed in firmware and saw lots of things written on coreboot and dasharo front, where nitrokey used coreboot fork (see modukes/coreboot) is pointing to older coreboot version hash.

That will need more testing and validation, first step being to have coreboot point to the new commit of coreboot fork for novacustom dasharo, verifying which patches are still needed on coreboot 4.21, reviewing coreboot config for n50/nv41 and then pushing a PR for willing testers to test containing all those changes. I'm not sure taking boot config alone is possible when it comes to s3/sx newer sleep states. Platform need to support s3 to be compatible with qubes at the time of writing those lines, where qubesos said maybe a month prior of having fixes ready on Xen side as approximate timeline.

Other then that, the os tries to do best with latest kernel versions and fixes are applied normally on systemd etc, but if firmware doesn't expose things correctly, sleep/resume issues are normally not resolved elsewhere then in firmware and here, that means under coreboot.


Disclosure: I will not make an habit of being under nitrokey/heads instead of linuxboot/heads. The reason why I'm replying to those issues is a concern of myself as well. From my perspective, those platforms are not completely usable from users. And to my opinion, those users deserve to know.

Hopefully those issues are worked in collaboration with their upstream source of solution, but that is not under my power.

Heads consider that hardware initialization made by coreboot is made correctly and depends on that correct initialization for the final OS to behave properly.

commandline-be commented 11 months ago

should that info be of use, my use case is regular Linux, not Qubes.

typically the recommendation with sleep/freeze issues is to use something similar to either of the below, sadly not without drawbacks.

https://unix.stackexchange.com/questions/419456/i915-intel-skylake-system-freeze-after-wake-up-from-hibernate-suspend-to-disk

https://hobo.house/2018/05/18/fix-for-intel-i915-gpu-freeze-on-recent-linux-kernels/

though I feel tempted I don't feel myself compiling heads/dasharo would have much use here, hence also the reason i shared the archlinux patch mentioned which reportedly is a fix for all suspend/freeze issues.

currently evaluating this page to learn more about configuration and possible erroneous events

https://wiki.archlinux.org/title/Power_management

commandline-be commented 10 months ago

confirming v2.4 resolved suspend issue