Open marmarek opened 4 months ago
Reloading e1000e
module in sys-net does not help.
Ugh...
drivers/net/ethernet/intel/e1000e/ich8lan.c:
/* It is not possible to be certain of the current state of ULP
* so forcibly disable it.
*/
hw->dev_spec.ich8lan.ulp_state = e1000_ulp_state_unknown;
ret_val = e1000_disable_ulp_lpt_lp(hw, true);
if (ret_val)
e_warn("Failed to disable ULP\n");
...
/**
* e1000_disable_ulp_lpt_lp - unconfigure Ultra Low Power mode for LynxPoint-LP
* @hw: pointer to the HW structure
* @force: boolean indicating whether or not to force disabling ULP
*
* Un-configure ULP mode when link is up, the system is transitioned from
* Sx or the driver is unloaded. If on a Manageability Engine (ME) enabled
* system, poll for an indication from ME that ULP has been un-configured.
* If not on an ME enabled system, un-configure the ULP mode by software.
*
* During nominal operation, this function is called when link is acquired
* to disable ULP mode (force=false); otherwise, for example when unloading
* the driver or during Sx->S0 transitions, this is called with force=true
* to forcibly disable ULP.
*/
static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
{
...
if (force) {
/* Request ME un-configure ULP mode in the PHY */
mac_reg = er32(H2ME);
mac_reg &= ~E1000_H2ME_ULP;
mac_reg |= E1000_H2ME_ENFORCE_SETTINGS;
ew32(H2ME, mac_reg);
}
But, ew32(H2ME, ...)
actually writes to the lan device register, not a separate device - here, in bar0:
#define E1000_H2ME 0x05B50 /* Host to ME */
#define E1000_H2ME_START_DPG 0x00000001 /* indicate the ME of DPG */
#define E1000_H2ME_EXIT_DPG 0x00000002 /* indicate the ME exit DPG */
#define E1000_H2ME_ULP 0x00000800 /* ULP Indication Bit */
#define E1000_H2ME_ENFORCE_SETTINGS 0x00001000 /* Enforce Settings */
It's not clear to me how they communicate, but maybe assigning device to the VM breaks this communication?
Or maybe it's more generic problem. When it happens I see a mismatch in memory decoding (see Mem+
or Mem-
in Control
, and also [disabled]
next to Region 0
:
sys-net: lspci -vvs 7.0
00:07.0 Ethernet controller: Intel Corporation Device 550a (rev 20)
Subsystem: CLEVO/KAPOK Computer Device a743
Physical Slot: 7
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin D routed to IRQ 47
Region 0: Memory at f2000000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Kernel modules: e1000e
dom0: lspci -vvs 1f.6
00:1f.6 Ethernet controller: Intel Corporation Device 550a (rev 20)
DeviceName: Ethernet controller
Subsystem: CLEVO/KAPOK Computer Device a743
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin D routed to IRQ 21
Region 0: Memory at b54a0000 (32-bit, non-prefetchable) [disabled] [size=128K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 00000000fee01458 Data: 0000
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: pciback
Kernel modules: e1000e
Or maybe it's more generic problem. When it happens I see a mismatch in memory decoding (see
Mem+
orMem-
inControl
, and also[disabled]
next toRegion 0
:
That's it, re-enabling memory decoding in dom0 makes device working again. Worth checking if https://github.com/QubesOS/qubes-issues/issues/6411#issuecomment-1970270582 isn't the same problem. FYI @HW42
Is S3 working fine? If so, is any post installation step needed?
S3 works fine and should be active by default on V5xx series, no manual steps are required. I keep this issue open because I would like to make S0ix working too at some point, but that shouldn't affect users.
S3 works fine and should be active by default on V5xx series, no manual steps are required. I keep this issue open because I would like to make S0ix working too at some point, but that shouldn't affect users.
I'm positively surprised about that. Great!
How to file a helpful issue
Qubes OS release
R4.2 / R4.3
Brief summary
Using S0ix results in a broken system after resume.
Steps to reproduce
Expected behavior
System correctly suspends, and is fully functional after resume
Actual behavior
Power LED blinks, but according to
/sys/kernel/debug/pmc_core/substate_residencies
it didn't actually suspend./sys/kernel/debug/pmc_core/substate_requirements
also has empty "status" column next to all requirements.After resume wired network is broken. sys-net logs have:
sys-net logs
``` [2024-07-22 12:45:30] [ 237.842643] e1000e 0000:00:07.0 ens7: NIC Link is Down [2024-07-22 12:45:30] [ 237.863042] Freezing user space processes [2024-07-22 12:45:30] [ 237.864602] Freezing user space processes completed (elapsed 0.001 seconds) [2024-07-22 12:45:30] [ 237.864626] OOM killer disabled. [2024-07-22 12:45:30] [ 237.864637] Freezing remaining freezable tasks [2024-07-22 12:45:30] [ 237.865584] Freezing remaining freezable tasks completed (elapsed 0.000 seconds) [2024-07-22 12:45:30] [ 237.865607] xen:manage: Using suspend/resume for sleep/wakeup [2024-07-22 12:45:30] [ 237.868291] e1000e: EEE TX LPI TIMER: 00000011 [2024-07-22 12:46:34] [ 237.935960] xen:grant_table: Grant tables using version 1 layout [2024-07-22 12:46:34] [ 237.983971] iwlwifi 0000:00:06.0: WRT: Invalid buffer destination [2024-07-22 12:46:34] [ 238.141605] iwlwifi 0000:00:06.0: Not valid error log pointer 0x0024B5C0 for RT uCode [2024-07-22 12:46:34] [ 238.141784] iwlwifi 0000:00:06.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f [2024-07-22 12:46:34] [ 238.141818] iwlwifi 0000:00:06.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f [2024-07-22 12:46:34] [ 238.141849] iwlwifi 0000:00:06.0: WFPM_AUTH_KEY_0: 0x80 [2024-07-22 12:46:34] [ 238.141874] iwlwifi 0000:00:06.0: CNVI_SCU_SEQ_DATA_DW9: 0x0 [2024-07-22 12:46:34] [ 238.142487] iwlwifi 0000:00:06.0: RFIm is deactivated, reason = 4 [2024-07-22 12:46:37] [ 240.729199] e1000e 0000:00:07.0 ens7: Failed to disable ULP [2024-07-22 12:48:46] [ 369.728111] e1000e 0000:00:07.0 ens7: Hardware Error [2024-07-22 12:48:46] [ 369.728146] e1000e 0000:00:07.0 ens7: Timesync Tx Control register not set as expected [2024-07-22 12:48:46] [ 369.829179] e1000e 0000:00:07.0: EEE advertisement - unable to acquire PHY [2024-07-22 12:48:46] [ 369.832451] OOM killer enabled. [2024-07-22 12:48:46] [ 369.832458] Restarting tasks ... done. ```
After resume, sys-net was semi-frozen from some time (over a minute),
qubes.SuspendPost
service failed (due to vchan timeout).qvm-run --nogui
appears to work, but I'm not 100% sure if it's only because I tried it later.Wireless appears to be functional (at least listing available networks work).
sys-usb appears to be functional.