Open doozan opened 6 years ago
I tried reinstalling Gallium with the following results:
If I boot into the LiveUSB (galliumos-skylake-bismuth-3.0alpha2-20180705T015135Z.iso), suspend either works perfectly or freezes the system. It only worked perfectly twice out of maybe 20 times I rebooted to test it. When it works perfectly it can suspend/resume multiple times without any problems. I could not find a pattern for what made it work.
If I install Gallium without encryption, the system will always freeze when trying to suspend or reboot.
If I install Gallium with encryption, the system works as described in the original post (suspend works once, then doesn't)
I created a file /lib/systemd/system-sleep/debug.sh
#!/bin/sh
dmesg > /sleep-log.txt
to try to catch the debug messages on the failing suspend, but it was only writing a few lines around the time of the suspend operation and none of them seemed particularly interesting.
Surprisingly, when I looked at the timestamps near the successful suspend, I found the following:
[ 465.598532] Oops: 0000 [#1] PREEMPT SMP PTI
[ 465.598534] Modules linked in: rfcomm ccm cmac bnep lz4 lz4_compress zram nls_iso8859_1 snd_skl_nau88l25_max98357a snd_soc_hdac_hdmi snd_soc_dmic snd_soc_s
kl_ssp_clk joydev arc4 8250_dw iwlmvm intel_rapl x86_pkg_temp_thermal snd_soc_skl intel_powerclamp mac80211 coretemp snd_soc_skl_ipc snd_hda_ext_core kvm_inte
l snd_soc_sst_dsp snd_soc_sst_ipc snd_soc_acpi snd_soc_nau8825 snd_soc_max98357a kvm iwlwifi snd_hda_core irqbypass snd_pcm_oss snd_soc_core snd_mixer_oss uvc
video snd_compress snd_pcm_dmaengine videobuf2_vmalloc videobuf2_memops btusb videobuf2_v4l2 btrtl snd_seq_midi videobuf2_common snd_pcm snd_seq_midi_event bt
bcm btintel cfg80211 videodev snd_rawmidi bluetooth snd_seq media ecdh_generic shpchp intel_lpss_pci intel_lpss snd_seq_device snd_timer processor_thermal_dev
ice intel_soc_dts_iosf
[ 465.598575] elan_i2c snd cros_ec_core mfd_core soundcore int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel mac_hid sch_fq_codel ip_tab
les x_tables autofs4 btrfs xor zstd_decompress zstd_compress xxhash raid6_pq algif_skcipher af_alg dm_crypt dm_mirror dm_region_hash dm_log mmc_block crct10di
f_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel i915 aes_x86_64 crypto_simd glue_helper cryptd i2c_algo_bit drm_kms_helper syscopyarea sdhci_pci sy
sfillrect sysimgblt fb_sys_fops video cqhci drm sdhci drm_panel_orientation_quirks
[ 465.598608] CPU: 1 PID: 2156 Comm: kworker/1:0 Not tainted 4.16.13-galliumos #1
[ 465.598609] Hardware name: Google Sentry/Sentry, BIOS MrChromebox 08/02/2018
[ 465.598614] Workqueue: events deferred_probe_work_func
[ 465.598619] RIP: 0010:skl_tplg_init+0x16f/0x250 [snd_soc_skl]
[ 465.598620] RSP: 0000:ffffb28a082ffc68 EFLAGS: 00010246
[ 465.598622] RAX: ffff94927896c3d8 RBX: ffffffffc0b81f60 RCX: 0000000000000000
[ 465.598623] RDX: 0000000100000000 RSI: ffff949278808018 RDI: ffff9492788080d8
[ 465.598624] RBP: ffff949275f8bd58 R08: 0000000000000001 R09: 0000000000000000
[ 465.598626] R10: ffff949272aa3858 R11: 0000000000000000 R12: ffff94927712a200
[ 465.598627] R13: 0000000000000001 R14: ffff9492788081d8 R15: ffff949274cd2ec0
[ 465.598629] FS: 0000000000000000(0000) GS:ffff94927ed00000(0000) knlGS:0000000000000000
[ 465.598630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 465.598631] CR2: 000000010000025a CR3: 000000002500a001 CR4: 00000000002606e0
[ 465.598632] Call Trace:
[ 465.598641] skl_platform_soc_probe+0x50/0x230 [snd_soc_skl]
[ 465.598652] soc_probe_component+0x28a/0x3d0 [snd_soc_core]
[ 465.598661] snd_soc_register_card+0x55b/0xe90 [snd_soc_core]
[ 465.598670] ? devm_snd_soc_register_card+0x80/0x80 [snd_soc_core]
[ 465.598679] devm_snd_soc_register_card+0x3c/0x80 [snd_soc_core]
[ 465.598683] platform_drv_probe+0x38/0x90
[ 465.598686] driver_probe_device+0x30b/0x480
[ 465.598689] ? __driver_attach+0xe0/0xe0
[ 465.598691] bus_for_each_drv+0x59/0x90
[ 465.598694] __device_attach+0xb5/0x130
[ 465.598697] bus_probe_device+0x8a/0xa0
[ 465.598699] deferred_probe_work_func+0x42/0x150
[ 465.598704] process_one_work+0x1d4/0x3f0
[ 465.598707] worker_thread+0x2b/0x3d0
[ 465.598710] ? process_one_work+0x3f0/0x3f0
[ 465.598712] kthread+0x113/0x130
[ 465.598715] ? kthread_create_worker_on_cpu+0x50/0x50
[ 465.598718] ? SyS_exit_group+0x10/0x10
[ 465.598721] ret_from_fork+0x35/0x40
[ 465.598724] Code: 48 39 d7 48 8d 42 f8 74 6d 31 c9 45 31 c9 eb 14 80 fa 06 48 8b 50 08 41 0f 45 c8 48 39 d7 48 8d 42 f8 74 26 48 8b 10 48 8b 52 30 <0f> b6 92 5a 02 00 00 80 fa 05 75 d9 48 8b 50 08 41 b9 01 00 00
[ 465.598763] RIP: skl_tplg_init+0x16f/0x250 [snd_soc_skl] RSP: ffffb28a082ffc68
[ 465.598764] CR2: 000000010000025a
[ 465.598766] ---[ end trace b0732f12d2bcfce1 ]---
So I edited /etc/default/grub
and changed
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash tpm_tis.interrupts=0 acpi_osi=Linux"
to
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash tpm_tis.interrupts=0 acpi_osi=Linux module_blacklist=snd_soc_max98357a"
and everything is working wonderfully now except audio, but that didn't work to begin with
This has been an important lesson in the importance of looking at dmesg before diving into everything. I could have found this a lot faster if I had looked at dmesg after the first suspend instead of trying to debug the second suspend.
TL;DR: it's the sound module
System: Thinkpad 13 Chromebook (SENTRY) with Skylake processor Firmware: Full EFCI from mrchromebox installed via his firmware-util.sh script OS: Nightly, installed via ISO and updated via apt-get dist-upgrade
I can suspend/resume my machine exactly once, after resuming the next attempt to suspend/reboot/shutdown causes the system to hang until powered off manually. After the manual shutdown, I can turn the machine back on, suspend, and then hang again on the next attempt to suspend/reboot/shutdown. I can shutdown or reboot multiple times in a row without any problems, it's only after a suspend/resume that the next attempt to suspend/reboot/shutdown hangs.
Looking at syslog for the successful suspend and the frozen suspend show that a successful suspend ends with
but the frozen suspend is missing the last line:
The last lines printed to the screen on a frozen reboot attempt are:
I've got an old unpartitioned sdcard inserted, otherwise I get an "mmc1: Timeout waiting for hardware cmd interrupt." error every 10 seconds. I've tried with other sdcards with the same results. If I don't have a sdcard inserted then suspend/reboot/shutdown will start but never complete -- some things stop but the xwindows continues to run and respond to keyboard/touchpad input.
I'm not sure where to go next for troubleshooting this. My hunch is that it's something to do with the warmboot/resume handling in the firmware, but I don't know how to go about testing that.