TamtamHero / fw-fanctrl

A simple systemd service to better control Framework Laptop's fan(s)
BSD 3-Clause "New" or "Revised" License
178 stars 33 forks source link

Occasionally pins CPU to 100% #17

Closed gidoBOSSftw5731 closed 2 months ago

gidoBOSSftw5731 commented 11 months ago

Framework 13th gen, seems like sometimes after it suspends/hibernates or runs for a few days something goes haywire and it goes up to 100% cpu usage (as reported by top, so only one thread.) Restarting the service solves this. Next time it happens I'll try to collect some logs.

gidoBOSSftw5731 commented 10 months ago

Relevant journalctl except:

Sep 30 16:24:08 gframe systemd[1]: Started fw-fanctrl.service - FrameWork Fan Controller.
░░ Subject: A start job for unit fw-fanctrl.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A start job for unit fw-fanctrl.service has finished successfully.
░░ 
░░ The job identifier is 147846.
Sep 30 16:25:53 gframe python3[1147900]: EC returned error result code 7
Sep 30 16:46:55 gframe python3[1154504]: EC response reserved != 0
Sep 30 17:02:09 gframe python3[1162689]: Missing Chromium EC memory map.
Sep 30 17:02:09 gframe python3[1162689]: Cannot find I2C adapter
Sep 30 17:02:09 gframe python3[1162689]: Unable to establish host communication
Sep 30 17:02:09 gframe python3[1162689]: Couldn't find EC
Sep 30 17:23:09 gframe python3[1173005]: EC returned error result code 7
Sep 30 18:14:11 gframe python3[1193186]: Missing Chromium EC memory map.
Sep 30 18:14:11 gframe python3[1193186]: Cannot find I2C adapter
Sep 30 18:14:11 gframe python3[1193186]: Unable to establish host communication
Sep 30 18:14:11 gframe python3[1193186]: Couldn't find EC
Sep 30 19:41:33 gframe python3[1224615]: EC response reserved != 0
Sep 30 19:41:33 gframe python3[1224615]: EC returned error result code 7
Sep 30 19:45:59 gframe python3[1225688]: EC response version mismatch
Sep 30 19:45:59 gframe python3[1225688]: EC returned error result code 12
Sep 30 19:45:59 gframe python3[1225688]: EC returned error result code 1
Oct 01 16:58:59 gframe python3[1385517]: EC response reserved != 0
Oct 01 17:40:48 gframe python3[1409681]: EC response reserved != 0
Oct 01 18:12:20 gframe python3[1417193]: EC returned error result code 7
Oct 01 18:12:20 gframe python3[1417193]: EC returned error result code 1
Oct 01 18:18:08 gframe python3[1418430]: EC response reserved != 0
Oct 01 18:18:08 gframe python3[1418430]: EC returned error result code 7
Oct 01 18:23:10 gframe python3[1419717]: EC response reserved != 0
Oct 01 18:23:10 gframe python3[1419717]: EC returned error result code 7
Oct 01 19:51:12 gframe python3[1434358]: EC returned error result code 12
Oct 01 20:34:12 gframe python3[1441772]: EC returned error result code 7
Oct 01 21:06:12 gframe python3[1447207]: EC response reserved != 0
Oct 01 22:14:00 gframe python3[1462609]: EC returned error result code 7
Oct 01 22:14:00 gframe python3[1462609]: EC returned error result code 12
Oct 01 23:25:24 gframe python3[1474541]: EC response has invalid checksum
Oct 02 00:30:50 gframe python3[1621628]: EC response has invalid checksum
Oct 02 00:30:50 gframe python3[1621628]: EC returned error result code 7
Oct 02 02:50:58 gframe python3[2534277]: EC response reserved != 0
Oct 02 05:38:14 gframe python3[2576837]: EC response reserved != 0
Oct 02 05:57:24 gframe python3[2579945]: EC response has invalid checksum
Oct 02 06:49:24 gframe python3[2589184]: EC response version mismatch
Oct 02 07:00:28 gframe python3[2591025]: EC response reserved != 0
Oct 02 07:17:50 gframe python3[2593827]: EC response has invalid checksum
Oct 02 07:47:48 gframe python3[2598590]: Missing Chromium EC memory map.
Oct 02 07:47:48 gframe python3[2598590]: Cannot find I2C adapter
Oct 02 07:47:48 gframe python3[2598590]: Unable to establish host communication
Oct 02 07:47:48 gframe python3[2598590]: Couldn't find EC

dmesg output starting at the suspension:

[312001.582152] PM: suspend entry (deep)
[312001.592483] Filesystems sync: 0.010 seconds
[312001.592670] (NULL device *): firmware: direct-loading firmware iwlwifi-ty-a0-gf-a0.pnvm
[312001.592806] (NULL device *): firmware: direct-loading firmware regulatory.db
[312001.592827] (NULL device *): firmware: direct-loading firmware regulatory.db.p7s
[312001.592965] (NULL device *): firmware: direct-loading firmware intel/ibt-0041-0041.sfi
[312001.593157] (NULL device *): firmware: direct-loading firmware iwlwifi-ty-a0-gf-a0-74.ucode
[312001.594923] (NULL device *): firmware: direct-loading firmware intel/ibt-0041-0041.ddc
[312001.594946] (NULL device *): firmware: direct-loading firmware i915/adlp_dmc.bin
[312001.595209] (NULL device *): firmware: direct-loading firmware i915/adlp_guc_70.bin
[312001.596124] (NULL device *): firmware: direct-loading firmware i915/tgl_huc.bin
[312001.599300] Freezing user space processes
[312001.601389] Freezing user space processes completed (elapsed 0.002 seconds)
[312001.601392] OOM killer disabled.
[312001.601393] Freezing remaining freezable tasks
[312001.602719] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[312001.602759] printk: Suspending console(s) (use no_console_suspend to debug)
[312001.985107] ACPI: EC: interrupt blocked
[312002.013286] ACPI: PM: Preparing to enter system sleep state S3
[312002.027711] ACPI: EC: event blocked
[312002.027712] ACPI: EC: EC stopped
[312002.027712] ACPI: PM: Saving platform NVS memory
[312002.027966] Disabling non-boot CPUs ...
[312002.029237] smpboot: CPU 19 is now offline
[312002.061798] ACPI: PM: Low-level resume complete
[312002.061969] ACPI: EC: EC started
[312002.061971] ACPI: PM: Restoring platform NVS memory
[312002.064740] Enabling non-boot CPUs ...
[312002.064815] smpboot: Booting Node 0 Processor 19 APIC 0x3e
[312002.085876] core: cpu_atom PMU driver: PEBS-via-PT 
[312002.085889] ... version:                5
[312002.085892] ... bit width:              48
[312002.085894] ... generic registers:      6
[312002.085896] ... value mask:             0000ffffffffffff
[312002.085898] ... max period:             00007fffffffffff
[312002.085900] ... fixed-purpose events:   3
[312002.085901] ... event mask:             000000070000003f
[312002.086766] CPU19 is up
[312002.087649] ACPI: PM: Waking up from system sleep state S3
[312002.092042] ACPI: EC: interrupt unblocked
[312002.235453] ACPI: EC: event unblocked
[312002.243947] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
[312002.243951] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[312002.254246] iwlwifi 0000:aa:00.0: WRT: Invalid buffer destination
[312002.257423] nvme nvme0: Shutdown timeout set to 10 seconds
[312002.258356] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[312002.258887] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[312002.258888] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[312002.259516] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[312002.263766] nvme nvme0: 20/0/0 default/read/poll queues
[312002.417759] iwlwifi 0000:aa:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f
[312002.417779] iwlwifi 0000:aa:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
[312002.417792] iwlwifi 0000:aa:00.0: WFPM_AUTH_KEY_0: 0x80
[312002.417803] iwlwifi 0000:aa:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
[312002.499587] usb 1-9: reset full-speed USB device number 3 using xhci_hcd
[312005.197185] OOM killer enabled.
[312005.197186] Restarting tasks ... 
[312005.197203] pcieport 0000:00:1d.0: PCI bridge to [bus aa]
[312005.197238] pcieport 0000:00:1d.0:   bridge window [mem 0x7a200000-0x7a2fffff]
[312005.207472] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
[312005.208828] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])
[312005.213964] done.
[312005.213970] random: crng reseeded on system resumption
[312005.215338] usb 1-2: USB disconnect, device number 76
[312005.215342] usb 1-2.2: USB disconnect, device number 79
[312005.215403] usb 4-1: USB disconnect, device number 37
[312005.215405] usb 4-1.4: USB disconnect, device number 38
[312005.215430] cdc_ncm 4-1.4:2.0 eth0: unregister 'cdc_ncm' usb-0000:00:0d.0-1.4, CDC NCM (NO ZLP)
[312005.261320] PM: suspend exit
[312005.275451] smpboot: Booting Node 0 Processor 1 APIC 0x1
[312005.411897] usb 1-2.3: USB disconnect, device number 78
[312005.494873] smpboot: Booting Node 0 Processor 2 APIC 0x8
[312006.051414] usb 1-4: USB disconnect, device number 59
[312006.195305] usb 1-5: USB disconnect, device number 60
[312006.703678] smpboot: Booting Node 0 Processor 3 APIC 0x9
[312006.999304] smpboot: Booting Node 0 Processor 4 APIC 0x10
[312007.068159] iwlwifi 0000:aa:00.0: WRT: Invalid buffer destination
[312007.155824] smpboot: Booting Node 0 Processor 5 APIC 0x11
[312007.231141] iwlwifi 0000:aa:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f
[312007.231188] iwlwifi 0000:aa:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
[312007.231205] iwlwifi 0000:aa:00.0: WFPM_AUTH_KEY_0: 0x80
[312007.231218] iwlwifi 0000:aa:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
[312007.411422] smpboot: Booting Node 0 Processor 6 APIC 0x18
[312007.639698] smpboot: Booting Node 0 Processor 7 APIC 0x19
[312007.819266] smpboot: Booting Node 0 Processor 8 APIC 0x20
[312008.019275] smpboot: Booting Node 0 Processor 9 APIC 0x21
[312008.191317] smpboot: Booting Node 0 Processor 10 APIC 0x28
[312008.379282] smpboot: Booting Node 0 Processor 11 APIC 0x29
[312008.515284] smpboot: Booting Node 0 Processor 12 APIC 0x30
[312008.827300] smpboot: Booting Node 0 Processor 13 APIC 0x32
[312009.055303] smpboot: Booting Node 0 Processor 14 APIC 0x34
[312009.215606] usb 1-9: reset full-speed USB device number 3 using xhci_hcd
[312009.283353] smpboot: Booting Node 0 Processor 15 APIC 0x36
[312009.523301] smpboot: Booting Node 0 Processor 16 APIC 0x38
[312009.811377] smpboot: Booting Node 0 Processor 17 APIC 0x3a
[312010.099419] smpboot: Booting Node 0 Processor 18 APIC 0x3c
[312010.171593] usb 1-9: reset full-speed USB device number 3 using xhci_hcd
[312010.703728] wlan0: authenticate with 34:fc:b9:06:ae:d2
[312010.714879] wlan0: send auth to 34:fc:b9:06:ae:d2 (try 1/3)
[312010.769746] wlan0: authenticated
[312010.771348] wlan0: associate with 34:fc:b9:06:ae:d2 (try 1/3)
[312010.773079] wlan0: RX AssocResp from 34:fc:b9:06:ae:d2 (capab=0x1411 status=0 aid=47)
[312010.807944] wlan0: associated
[312010.807999] wlan0: Limiting TX power to 30 (30 - 0) dBm as advertised by 34:fc:b9:06:ae:d2
[312026.698034] filter_write: 35 callbacks suppressed
[312026.698040] msr: Write to unrecognized MSR 0x38f by i7z (pid: 1216462).
[312026.698048] msr: See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about for details.
[312060.200578] filter_write: 83 callbacks suppressed
[312060.200580] msr: Write to unrecognized MSR 0x38f by i7z (pid: 1216462).
[312060.200583] msr: See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about for details.
[312093.978058] filter_write: 83 callbacks suppressed
[312093.978060] msr: Write to unrecognized MSR 0x38f by i7z (pid: 1216462).
[312093.978062] msr: See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about for details.
[312127.779405] filter_write: 83 callbacks suppressed
[312127.779408] msr: Write to unrecognized MSR 0x38f by i7z (pid: 1216462).
[312127.779412] msr: See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/about for details.

Also attached is the file output for perf record on the process. I'm not totally sure if this would be helpful but it might be a good datapoint. perf.data.tar.gz

leopoldhub commented 3 months ago

Hi @gidoBOSSftw5731 , We recently did a consequent refactoring of our codebase, is this issue still present in the latest version?

leopoldhub commented 2 months ago

Hi @gidoBOSSftw5731 , Since you did not reply, I assume that the buggy behaviour has been fixed with the latest version. I will, close this issue, if the bug is still there, feel free to reopen it. Have a nice day