Dasharo / dasharo-issues

The Dasharo issue tracker
https://dasharo.com/
25 stars 0 forks source link

novacustom_ns5x/7x_adl: EC errors upon suspend, keyboard/buttons stop working, device effectively bricked until battery runs out #1007

Open jpnssl opened 3 months ago

jpnssl commented 3 months ago

Component

Dasharo firmware, EC firmware

Device

NovaCustom NS5x 12th Gen

Dasharo version

v1.7.2

Dasharo Tools Suite version

No response

Test case ID

No response

Brief summary

When suspending, the DUT stays on but the screen goes dark and the device no longer responds to any input, including pressing the power button. The kernel log reports that the EC is unresponsive.

How reproducible

Very rarely, only three times in total so far. (but twice within the last week)

How to reproduce

  1. suspend the device (via the keyboard hotkey? this is how I almost always initiate suspend, so not sure if this is a factor.)

Expected behavior

The device suspends and resumes as normal

Actual behavior

The screen goes dark, but the device keeps running. It emits a noise similar to coil whining. No input is registered at all, including from the power button. There is no way to even power off the device.

Screenshots

No response

Additional context

The DUT runs the latest FW/EC versions (1.7.2) with suspend mode set to s0ix and the Management Engine enabled. Charge thresholds are configured to start/stop at 50%/60%. I suspect this is a EC problem, because the kernel reports that its attempts to control the EC time out. So far, the affected kernel versions include 6.8 and 6.10. (I'm running Gentoo) I am pretty sure that when the bug is triggered, the DUT does not even initiate the actual suspend process, as nothing to the indication is logged by the kernel and the device stays connected to WiFi. However, the fans seem to turn off and the device gets noticeably warm.

Relevant log snippet:

[   10.128417] wlan0: Limiting TX power to 20 (23 - 3) dBm as advertised by 74:42:7f:88:31:7b
[   11.259246] NET: Registered PF_PACKET protocol family
[   16.764609] Bluetooth: RFCOMM TTY layer initialized
[   16.764619] Bluetooth: RFCOMM socket layer initialized
[   16.764624] Bluetooth: RFCOMM ver 1.11
[   17.471228] rfkill: input handler disabled
[  892.769954] i915 0000:00:02.0: Using 39-bit DMA addresses
[ 2631.925518] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 2631.925524] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 2631.925533] No Local Variables are initialized for Method [UPBS]

[ 2631.925534] No Arguments are initialized for method [UPBS]

[ 2631.925537] ACPI Error: Aborting method \_SB.BAT0.UPBS due to previous error (AE_TIME) (20240322/psparse-529)
[ 2631.925542] ACPI Error: Aborting method \_SB.BAT0._BST due to previous error (AE_TIME) (20240322/psparse-529)
[ 2631.925548] ACPI: \_SB_.BAT0: _BST evaluation failed: AE_TIME
[ 2634.425539] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 2634.425545] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 2634.425552] No Local Variables are initialized for Method [UPBS]

[ 2634.425553] No Arguments are initialized for method [UPBS]

[ 2634.425555] ACPI Error: Aborting method \_SB.BAT0.UPBS due to previous error (AE_TIME) (20240322/psparse-529)
[ 2634.425560] ACPI Error: Aborting method \_SB.BAT0._BST due to previous error (AE_TIME) (20240322/psparse-529)
[ 2634.425565] ACPI: \_SB_.BAT0: _BST evaluation failed: AE_TIME
[ 2636.925561] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 2636.925565] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 2636.925572] No Local Variables are initialized for Method [UPBS]

[ 2636.925573] No Arguments are initialized for method [UPBS]

[ 2636.925575] ACPI Error: Aborting method \_SB.BAT0.UPBS due to previous error (AE_TIME) (20240322/psparse-529)
[ 2636.925580] ACPI Error: Aborting method \_SB.BAT0._BST due to previous error (AE_TIME) (20240322/psparse-529)
[ 2636.925584] ACPI: \_SB_.BAT0: _BST evaluation failed: AE_TIME
...

These errors keep on repeating every few seconds.

During the latest occurrence, the DUT's screen happened to wake up on its own, probably due to a notification from a terminal emulator about a closed ssh session? At least the touchpad worked, but the internal keyboard and all other buttons remained unresponsive. I was able to unlock the device using a USB keyboard. When attempting to suspend again, the kernel logged the following errors:

[ 3872.935442] PM: suspend entry (s2idle)
[ 3872.942646] Filesystems sync: 0.007 seconds
[ 3872.942733] Loading firmware: iwlwifi-cc-a0-77.ucode
[ 3872.942739] Loading firmware: rtl_nic/rtl8168h-2.fw
[ 3872.942792] Loading firmware: regulatory.db
[ 3872.942818] Loading firmware: regulatory.db.p7s
[ 3872.942844] Loading firmware: i915/adlp_dmc.bin
[ 3872.942867] Loading firmware: i915/adlp_guc_70.bin
[ 3872.942875] Loading firmware: i915/tgl_huc.bin
[ 3872.942876] Loading firmware: intel/ibt-20-1-3.sfi
[ 3872.942877] Loading firmware: intel/ibt-20-1-3.ddc
[ 3873.043549] Freezing user space processes
[ 3874.832781] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3874.832786] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3874.832796] No Local Variables are initialized for Method [UPBS]

[ 3874.832797] No Arguments are initialized for method [UPBS]

[ 3874.832799] ACPI Error: Aborting method \_SB.BAT0.UPBS due to previous error (AE_TIME) (20240322/psparse-529)
[ 3874.832804] ACPI Error: Aborting method \_SB.BAT0._BST due to previous error (AE_TIME) (20240322/psparse-529)
[ 3874.832810] ACPI: \_SB_.BAT0: _BST evaluation failed: AE_TIME
[ 3874.838016] Freezing user space processes completed (elapsed 1.794 seconds)
[ 3874.838019] OOM killer disabled.
[ 3874.838020] Freezing remaining freezable tasks
[ 3874.838921] Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
[ 3874.838922] printk: Suspending console(s) (use no_console_suspend to debug)
[ 3875.526109] psmouse serio1: Failed to disable mouse on isa0060/serio1
[ 3880.210812] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3880.210816] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3880.210821] No Local Variables are initialized for Method [_PSW]

[ 3880.210821] Initialized Arguments for Method [_PSW]:  (1 arguments defined for method invocation)
[ 3880.210822]   Arg0:   0000000046bfa996 <Obj>           Integer 0000000000000001

[ 3880.210826] ACPI Error: Aborting method \_SB.LID0._PSW due to previous error (AE_TIME) (20240322/psparse-529)
[ 3880.210829] ACPI: \_SB_.LID0: _PSW execution failed
[ 3880.216895] ACPI: EC: interrupt blocked
[ 3885.282876] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3885.282880] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3885.282888] No Local Variables are initialized for Method [EDSX]

[ 3885.282889] Initialized Arguments for Method [EDSX]:  (1 arguments defined for method invocation)
[ 3885.282889]   Arg0:   00000000b6411328 <Obj>           Integer 0000000000000000

[ 3885.282895] ACPI Error: Aborting method \_SB.PCI0.LPCB.EC0.EDSX due to previous error (AE_TIME) (20240322/psparse-529)
[ 3885.282899] ACPI Error: Aborting method \_SB.PCI0.PEPD._DSM due to previous error (AE_TIME) (20240322/psparse-529)
[ 3885.282905] ACPI: \_SB_.PCI0.PEPD: failed to evaluate _DSM a040ebc4-d26c-e211-bcfd-0800200c9a66 (0x11)
[ 3890.282904] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3890.282905] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3890.282907] No Local Variables are initialized for Method [S0IX]

[ 3890.282908] Initialized Arguments for Method [S0IX]:  (1 arguments defined for method invocation)
[ 3890.282908]   Arg0:   00000000afcb8a0d <Obj>           Integer 0000000000000001

[ 3890.282910] ACPI Error: Aborting method \_SB.PCI0.LPCB.EC0.S0IX due to previous error (AE_TIME) (20240322/psparse-529)
[ 3890.282913] ACPI Error: Aborting method \_SB.PCI0.PEPD._DSM due to previous error (AE_TIME) (20240322/psparse-529)
[ 3890.282916] ACPI: \_SB_.PCI0.PEPD: failed to evaluate _DSM a040ebc4-d26c-e211-bcfd-0800200c9a66 (0x11)
[ 3895.282937] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3895.282939] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3895.282941] No Local Variables are initialized for Method [S0IX]

[ 3895.282941] Initialized Arguments for Method [S0IX]:  (1 arguments defined for method invocation)
[ 3895.282942]   Arg0:   00000000010d1e78 <Obj>           Integer 0000000000000000

[ 3895.282943] ACPI Error: Aborting method \_SB.PCI0.LPCB.EC0.S0IX due to previous error (AE_TIME) (20240322/psparse-529)
[ 3895.282946] ACPI Error: Aborting method \_SB.PCI0.PEPD._DSM due to previous error (AE_TIME) (20240322/psparse-529)
[ 3895.282949] ACPI: \_SB_.PCI0.PEPD: failed to evaluate _DSM a040ebc4-d26c-e211-bcfd-0800200c9a66 (0x11)
[ 3900.282962] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3900.282963] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3900.282965] No Local Variables are initialized for Method [EDSX]

[ 3900.282966] Initialized Arguments for Method [EDSX]:  (1 arguments defined for method invocation)
[ 3900.282966]   Arg0:   000000004216fd5e <Obj>           Integer 0000000000000001

[ 3900.282968] ACPI Error: Aborting method \_SB.PCI0.LPCB.EC0.EDSX due to previous error (AE_TIME) (20240322/psparse-529)
[ 3900.282970] ACPI Error: Aborting method \_SB.PCI0.PEPD._DSM due to previous error (AE_TIME) (20240322/psparse-529)
[ 3900.282973] ACPI: \_SB_.PCI0.PEPD: failed to evaluate _DSM a040ebc4-d26c-e211-bcfd-0800200c9a66 (0x11)
[ 3900.283128] ACPI: EC: interrupt unblocked
[ 3900.391287] intel_pmc_core INT33A1:00: CPU did not enter Package C10!!! (Package C10 cnt=0x9dd47e7d7d)
[ 3900.391305] intel_pmc_core INT33A1:00: Prev Package C2 cnt = 0xe01e36d70c, Current Package C2 cnt = 0xe029271e56
[ 3900.391314] intel_pmc_core INT33A1:00: Prev Package C3 cnt = 0x9eb59ef99, Current Package C3 cnt = 0x9ebe4ebdf
[ 3900.391320] intel_pmc_core INT33A1:00: Prev Package C6 cnt = 0x3a054cc2b6, Current Package C6 cnt = 0x3a062e4605
[ 3900.391325] intel_pmc_core INT33A1:00: Prev Package C7 cnt = 0x0, Current Package C7 cnt = 0x0
[ 3900.391330] intel_pmc_core INT33A1:00: Prev Package C8 cnt = 0x1a66461523b, Current Package C8 cnt = 0x1a66461523b
[ 3900.391336] intel_pmc_core INT33A1:00: Prev Package C9 cnt = 0x0, Current Package C9 cnt = 0x0
[ 3900.391341] intel_pmc_core INT33A1:00: Prev Package C10 cnt = 0x9dd47e7d7d, Current Package C10 cnt = 0x9dd47e7d7d
[ 3900.406581] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.29.2
[ 3900.406586] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[ 3900.408872] nvme nvme0: D3 entry latency set to 8 seconds
[ 3900.420831] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[ 3900.421201] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[ 3900.421203] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[ 3900.421653] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[ 3900.473530] nvme nvme0: 12/0/0 default/read/poll queues
[ 3902.896941] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3902.896944] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3902.896950] No Local Variables are initialized for Method [_STA]

[ 3902.896951] No Arguments are initialized for method [_STA]

[ 3902.896952] ACPI Error: Aborting method \_SB.BAT0._STA due to previous error (AE_TIME) (20240322/psparse-529)
[ 3902.896957] ACPI: \_SB_.BAT0: _STA evaluation failed
[ 3905.396960] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3905.396962] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3905.396966] No Local Variables are initialized for Method [_LID]

[ 3905.396967] No Arguments are initialized for method [_LID]

[ 3905.396968] ACPI Error: Aborting method \_SB.LID0._LID due to previous error (AE_TIME) (20240322/psparse-529)
[ 3907.896972] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3907.896974] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3907.896977] No Local Variables are initialized for Method [_LID]

[ 3907.896978] No Arguments are initialized for method [_LID]

[ 3907.896979] ACPI Error: Aborting method \_SB.LID0._LID due to previous error (AE_TIME) (20240322/psparse-529)
[ 3907.901712] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops __SCT__tp_func_intel_frontbuffer_flush [i915])
[ 3907.902644] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops __SCT__tp_func_intel_frontbuffer_flush [i915])
[ 3908.567187] atkbd serio0: Failed to deactivate keyboard on isa0060/serio0
[ 3909.906022] OOM killer enabled.
[ 3909.906023] Restarting tasks ... 
[ 3909.906181] usb 3-1: USB disconnect, device number 4
[ 3909.906630] done.
[ 3909.906637] random: crng reseeded on system resumption
[ 3909.906641] video LNXVIDEO:00: Restoring backlight state
[ 3910.489151] atkbd serio0: Failed to enable keyboard on isa0060/serio0
[ 3912.407010] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3912.407015] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3912.407023] No Local Variables are initialized for Method [UPBS]

[ 3912.407024] No Arguments are initialized for method [UPBS]

[ 3912.407026] ACPI Error: Aborting method \_SB.BAT0.UPBS due to previous error (AE_TIME) (20240322/psparse-529)
[ 3912.407030] ACPI Error: Aborting method \_SB.BAT0._BST due to previous error (AE_TIME) (20240322/psparse-529)
[ 3912.407037] ACPI: \_SB_.BAT0: _BST evaluation failed: AE_TIME
[ 3913.416897] i8042: Can't write CTR while closing AUX port
[ 3913.995163] i8042: Can't reactivate AUX port
[ 3914.909013] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3914.909018] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3914.909026] No Local Variables are initialized for Method [UPBS]

[ 3914.909027] No Arguments are initialized for method [UPBS]

[ 3914.909029] ACPI Error: Aborting method \_SB.BAT0.UPBS due to previous error (AE_TIME) (20240322/psparse-529)
[ 3914.909033] ACPI Error: Aborting method \_SB.BAT0._BST due to previous error (AE_TIME) (20240322/psparse-529)
[ 3914.909039] ACPI: \_SB_.BAT0: _BST evaluation failed: AE_TIME
[ 3915.863853] i8042: Can't write CTR while closing AUX port
[ 3916.443551] i8042: Can't reactivate AUX port
[ 3917.409050] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3917.409055] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3917.409063] No Local Variables are initialized for Method [UPBX]

[ 3917.409065] No Arguments are initialized for method [UPBX]

[ 3917.409067] ACPI Error: Aborting method \_SB.BAT0.UPBX due to previous error (AE_TIME) (20240322/psparse-529)
[ 3917.409072] ACPI Error: Aborting method \_SB.BAT0._BIX due to previous error (AE_TIME) (20240322/psparse-529)
[ 3917.409079] ACPI: \_SB_.BAT0: _BIX evaluation failed: AE_TIME
[ 3917.601174] i8042: Can't write CTR while closing AUX port
[ 3918.177892] i8042: Can't rea
[coreboot.log](https://github.com/user-attachments/files/16647723/coreboot.log)
ctivate AUX port
[ 3919.909043] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3919.909047] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3919.909053] No Local Variables are initialized for Method [UPBI]

[ 3919.909055] No Arguments are initialized for method [UPBI]

[ 3919.909056] ACPI Error: Aborting method \_SB.BAT0.UPBI due to previous error (AE_TIME) (20240322/psparse-529)
[ 3919.909061] ACPI Error: Aborting method \_SB.BAT0._BIF due to previous error (AE_TIME) (20240322/psparse-529)
[ 3919.909065] ACPI: \_SB_.BAT0: _BIF evaluation failed: AE_TIME
[ 3922.409060] ACPI Error: AE_TIME, Returned by Handler for [EmbeddedControl] (20240322/evregion-296)
[ 3922.409065] ACPI Error: Timeout from EC hardware or EC device driver (20240322/evregion-306)

[ 3922.409069] No Local Variables are initialized for Method [UPBS]

[ 3922.409070] No Arguments are initialized for method [UPBS]

[ 3922.409071] ACPI Error: Aborting method \_SB.BAT0.UPBS due to previous error (AE_TIME) (20240322/psparse-529)
[ 3922.409075] ACPI Error: Aborting method \_SB.BAT0._BST due to previous error (AE_TIME) (20240322/psparse-529)
[ 3922.409079] ACPI: \_SB_.BAT0: _BST evaluation failed: AE_TIME
...

While I cannot spot anything out of the ordinary there, attached is also a dump of the coreboot log: coreboot.log

Solutions you've tried

wessel-novacustom commented 3 months ago

Some application or event must have caused an EC crash. I remember I had this once after tweaking TLP for a better battery life on my personal laptop.

Unluckily, we don't have an EC fallback stop yet in v1.7.2 for your laptop. This will be present in future updates, so that you can always turn off the laptop by holding the power button for 10 seconds.

For now, the only thing you can do to reset the laptop in such a situation is to remove the battery connector from the mainboard and to re-connect it.