GalliumOS / galliumos-distro

Docs, issues, and artwork sources for GalliumOS
https://galliumos.org/
GNU General Public License v2.0
348 stars 11 forks source link

Trackpad pointer freezes on Samsung Chromebook 3 (CELES) XE500C13, two different units #415

Open boutell opened 6 years ago

boutell commented 6 years ago

I have installed GalliumOS 2.1 with the beta full ROM firmware (via @MattDevo) and the 4.14.6 galliumOS beta kernel on two separate Samsung Chromebook 3 systems (CELES, XE500C13).

I have experienced the same problem on both machines: everything is fine at first, but eventually the pointer will freeze.

With the first machine I assumed it was because I did a clumsy job removing the write protect screw from behind the motherboard and messed something up. However, on the second machine everything went very smoothly (antistatic strap, removed the battery as step one, etc) and it seems unlikely I'd mess it up the same way twice.

At one point I wrote a script to unload and reload the atmel_mxt_ts driver; I'd alt-tab over to a terminal window and run this on each freeze. It worked, but for varying periods of time, and would always freeze again.

I don't see anything interesting in /var/log/syslog when the pointer freezes. However, at boot I see this:

Jan 27 12:33:38 iloverocks kernel: [ 11.641900] atmel_mxt_ts i2c-ATML0000:00: Direct firmware load for maxtouch.cfg failed with error -2 Jan 27 12:33:38 iloverocks kernel: [ 11.642424] atmel_mxt_ts i2c-ATML0000:00: Resetting device

There appear to be official drivers here:

https://github.com/atmel-maxtouch/linux

I'm not sure if the latest version of this code is included in the kernel or not.

Any ideas?

Thanks!

boutell commented 6 years ago

The 4.14.14 kernel makes this issue much less frequent, but it still occurs. Seems liked to high load, but not necessarily.

begati commented 6 years ago

same issue here on 4.14.14 kernel, many times in a day use...

jtecca commented 6 years ago

Confirming that I get the same issue on 4.14.14 and the full ROM firmware. I also notice stuttering video and audio during this time that the trackpad pointer freezes. Possibly related is that I notice that sometimes keyboard input (characters displaying on screen) sometimes lags around the time that the trackpad pointer freezes.

I had been running the RW_LEGACY firmware for about a week and never had any of these issues.

For those needing a temporary fix, something like:

#/usr/bin/env bash
sudo modprobe -r atmel_mxt_ts
sudo modprobe atmel_mxt_ts

in an executable script will unload and re-load the trackpad driver which seems to temporarily fix the issue as @boutell noticed in his first message.

boutell commented 6 years ago

Yes, I noticed the stuttering yesterday.

On Feb 18, 2018 4:58 PM, "jeff tecca" notifications@github.com wrote:

Confirming that I get the same issue on 4.14.14 and the full ROM firmware. I also notice stuttering video and audio during this time that the trackpad pointer freezes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-366551586, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fesEKLhh1SXOhA15gFntRXtTwhipks5tWJzzgaJpZM4Rvb5Q .

begati commented 6 years ago

idk if its a coincidence, but, i've created a 2 GB swap file on a micro sd and mysteriously the problem with trackpad was fixed!

boutell commented 6 years ago

Interesting! Just a swap file, or a swap partition? I could do either, but a swap file is lazier (:

On Wed, Feb 21, 2018 at 11:36 PM, Evandro Begati notifications@github.com wrote:

idk if is a coincidence, but, i've created an 2 GB swap file on a micro sd and mysteriously the problem with trackpad was fixed!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-367565059, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fSUDHeIZbFYy1W27yci36sn-msZYks5tXO7lgaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

begati commented 6 years ago

give a try on swap file, if it works, try creating a swap partition! I’ve tested both.

boutell commented 6 years ago

I do already have a swap partition, but I'll have to check how large. Yours is really large.

On Thu, Feb 22, 2018 at 10:33 AM, Evandro Begati notifications@github.com wrote:

give a try on swap file, if it works, try creating a swap partition! I’ve tested both.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-367718710, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fdHIGKEi37HIKnIDdkTfgtnc2rXtks5tXYjkgaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

rusty122 commented 6 years ago

Not sure if this is related but I'm running RW_LEGACY with testing enabled and have recently started running into issues with the trackpad pointer freezing.

At boot, get atmel_mxt_ts i2c-ATML0000:01: Direct firmware load for maxtouch.cfg failed with error -2 too.

But I'm also seeing a lot of errors in /var/log/syslog:

Feb 21 18:42:33 fluorine kernel: [28025.196806] pipe A vblank wait timed out
Feb 21 18:42:33 fluorine kernel: [28025.196882] ------------[ cut here ]------------
Feb 21 18:42:33 fluorine kernel: [28025.196941] WARNING: CPU: 0 PID: 26987 at drivers/gpu/drm/i915/intel_display.c:12195 intel_atomic_commit_tail+0xe5a/0xe80 [i915]
Feb 21 18:42:33 fluorine kernel: [28025.196943] Modules linked in: iwlmvm mac80211 iwlwifi cfg80211 xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 rfcomm ccm xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables msr bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 btusb videobuf2_core btrtl btbcm btintel videodev lz4 bluetooth lz4_compress media ecdh_generic zram binfmt_misc arc4 snd_soc_sst_cht_bsw_rt5645 intel_rapl intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcbc snd_hda_codec_hdmi snd_hda_intel snd_hda_codec kvm_intel snd_intel_sst_acpi aesni_intel snd_intel_sst_core snd_soc_sst_atom_hifi2_platform
Feb 21 18:42:33 fluorine kernel: [28025.197003]  snd_soc_rt5645 snd_hwdep snd_hda_core aes_x86_64 snd_soc_rl6231 crypto_simd snd_soc_acpi snd_soc_acpi_intel_match glue_helper cryptd snd_seq_midi snd_soc_core snd_seq_midi_event kvm snd_compress snd_rawmidi snd_pcm_dmaengine snd_pcm_oss snd_mixer_oss snd_seq snd_pcm lpc_ich irqbypass snd_seq_device mfd_core snd_timer processor_thermal_device shpchp intel_soc_dts_iosf snd int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel soundcore chromeos_pstore atmel_mxt_ts 8250_dw mac_hid autofs4 btrfs xor zstd_decompress zstd_compress xxhash raid6_pq dm_mirror dm_region_hash dm_log i915 mmc_block video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm sdhci_acpi sdhci [last unloaded: cfg80211]
Feb 21 18:42:33 fluorine kernel: [28025.197061] CPU: 0 PID: 26987 Comm: kworker/u4:0 Tainted: G        W       4.14.14-galliumos #1
Feb 21 18:42:33 fluorine kernel: [28025.197062] Hardware name: GOOGLE Celes, BIOS          01/19/2017
Feb 21 18:42:33 fluorine kernel: [28025.197101] Workqueue: events_unbound intel_atomic_commit_work [i915]
Feb 21 18:42:33 fluorine kernel: [28025.197103] task: ffff8edc0e57ea00 task.stack: ffffa71a03c40000
Feb 21 18:42:33 fluorine kernel: [28025.197141] RIP: 0010:intel_atomic_commit_tail+0xe5a/0xe80 [i915]
Feb 21 18:42:33 fluorine kernel: [28025.197143] RSP: 0018:ffffa71a03c43dd0 EFLAGS: 00010286
Feb 21 18:42:33 fluorine kernel: [28025.197146] RAX: 000000000000001c RBX: ffff8edc74a48000 RCX: 0000000000000001
Feb 21 18:42:33 fluorine kernel: [28025.197147] RDX: 0000000080000001 RSI: ffffffff90eb9282 RDI: 00000000ffffffff
Feb 21 18:42:33 fluorine kernel: [28025.197149] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000cb7
Feb 21 18:42:33 fluorine kernel: [28025.197150] R10: ffffa71a03c43dd0 R11: 0000000000000cb7 R12: 0000000000000000
Feb 21 18:42:33 fluorine kernel: [28025.197152] R13: 0000000000000000 R14: ffff8edc748f7000 R15: 0000000000000001
Feb 21 18:42:33 fluorine kernel: [28025.197154] FS:  0000000000000000(0000) GS:ffff8edc7fc00000(0000) knlGS:0000000000000000
Feb 21 18:42:33 fluorine kernel: [28025.197156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 21 18:42:33 fluorine kernel: [28025.197157] CR2: 00007f538a7cb000 CR3: 0000000153532000 CR4: 00000000001006f0
Feb 21 18:42:33 fluorine kernel: [28025.197159] Call Trace:
Feb 21 18:42:33 fluorine kernel: [28025.197175]  ? finish_task_switch+0x80/0x220
Feb 21 18:42:33 fluorine kernel: [28025.197180]  ? wait_woken+0x80/0x80
Feb 21 18:42:33 fluorine kernel: [28025.197184]  process_one_work+0x148/0x420
Feb 21 18:42:33 fluorine kernel: [28025.197188]  worker_thread+0x47/0x420
Feb 21 18:42:33 fluorine kernel: [28025.197192]  kthread+0xfc/0x130
Feb 21 18:42:33 fluorine kernel: [28025.197195]  ? process_one_work+0x420/0x420
Feb 21 18:42:33 fluorine kernel: [28025.197197]  ? kthread_create_on_node+0x40/0x40
Feb 21 18:42:33 fluorine kernel: [28025.197201]  ret_from_fork+0x32/0x40
Feb 21 18:42:33 fluorine kernel: [28025.197205] Code: 24 50 4c 89 04 24 48 83 c7 08 e8 32 7b e1 cf 4c 8b 04 24 4d 85 c0 0f 85 08 fe ff ff 8d 75 41 48 c7 c7 40 fa 31 c0 e8 21 06 e3 cf <0f> ff e9 f2 fd ff ff 8d 70 41 48 c7 c7 10 fa 31 c0 e8 0b 06 e3 
Feb 21 18:42:33 fluorine kernel: [28025.197257] ---[ end trace 826d9a59e8414bce ]---

Looping over an over. This eventually leads to a different error looping:

60111 Feb 21 09:52:47 fluorine kernel: [29027.550802] atmel_mxt_ts i2c-ATML0000:01: __mxt_read_reg: i2c transfer failed (-110)
60112 Feb 21 09:52:47 fluorine kernel: [29027.550811] atmel_mxt_ts i2c-ATML0000:01: Failed to read T44 and T5 (-110)
boutell commented 6 years ago

Interesting. My syslog is quiet when this happens (at least until I start messing with it by restarting the driver).

Have you tried a large swap file?

On Thu, Feb 22, 2018 at 2:52 PM, Russell Parker notifications@github.com wrote:

Not sure if this is related but I'm running RW_LEGACY with testing enabled and have recently started running into issues with the trackpad pointer freezing.

At boot, get atmel_mxt_ts i2c-ATML0000:01: Direct firmware load for maxtouch.cfg failed with error -2 too.

But I'm also seeing a lot of errors in /var/log/syslog:

Feb 21 18:42:33 fluorine kernel: [28025.196806] pipe A vblank wait timed out Feb 21 18:42:33 fluorine kernel: [28025.196882] ------------[ cut here ]------------ Feb 21 18:42:33 fluorine kernel: [28025.196941] WARNING: CPU: 0 PID: 26987 at drivers/gpu/drm/i915/intel_display.c:12195 intel_atomic_commit_tail+0xe5a/0xe80 [i915] Feb 21 18:42:33 fluorine kernel: [28025.196943] Modules linked in: iwlmvm mac80211 iwlwifi cfg80211 xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 rfcomm ccm xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables msr bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 btusb videobuf2_core btrtl btbcm btintel videodev lz4 bluetooth lz4_compress media ecdh_generic zram binfmt_misc arc4 snd_soc_sst_cht_bsw_rt5645 intel_rapl intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev pcbc snd_hda_codec_hdmi snd_hda_intel snd_hda_codec kvm_intel snd_intel_sst_acpi aesni_intel snd_intel_sst_core snd_soc_sst_atom_hifi2_platform Feb 21 18:42:33 fluorine kernel: [28025.197003] snd_soc_rt5645 snd_hwdep snd_hda_core aes_x86_64 snd_soc_rl6231 crypto_simd snd_soc_acpi snd_soc_acpi_intel_match glue_helper cryptd snd_seq_midi snd_soc_core snd_seq_midi_event kvm snd_compress snd_rawmidi snd_pcm_dmaengine snd_pcm_oss snd_mixer_oss snd_seq snd_pcm lpc_ich irqbypass snd_seq_device mfd_core snd_timer processor_thermal_device shpchp intel_soc_dts_iosf snd int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel soundcore chromeos_pstore atmel_mxt_ts 8250_dw mac_hid autofs4 btrfs xor zstd_decompress zstd_compress xxhash raid6_pq dm_mirror dm_region_hash dm_log i915 mmc_block video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm sdhci_acpi sdhci [last unloaded: cfg80211] Feb 21 18:42:33 fluorine kernel: [28025.197061] CPU: 0 PID: 26987 Comm: kworker/u4:0 Tainted: G W 4.14.14-galliumos #1 Feb 21 18:42:33 fluorine kernel: [28025.197062] Hardware name: GOOGLE Celes, BIOS 01/19/2017 Feb 21 18:42:33 fluorine kernel: [28025.197101] Workqueue: events_unbound intel_atomic_commit_work [i915] Feb 21 18:42:33 fluorine kernel: [28025.197103] task: ffff8edc0e57ea00 task.stack: ffffa71a03c40000 Feb 21 18:42:33 fluorine kernel: [28025.197141] RIP: 0010:intel_atomic_commit_tail+0xe5a/0xe80 [i915] Feb 21 18:42:33 fluorine kernel: [28025.197143] RSP: 0018:ffffa71a03c43dd0 EFLAGS: 00010286 Feb 21 18:42:33 fluorine kernel: [28025.197146] RAX: 000000000000001c RBX: ffff8edc74a48000 RCX: 0000000000000001 Feb 21 18:42:33 fluorine kernel: [28025.197147] RDX: 0000000080000001 RSI: ffffffff90eb9282 RDI: 00000000ffffffff Feb 21 18:42:33 fluorine kernel: [28025.197149] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000cb7 Feb 21 18:42:33 fluorine kernel: [28025.197150] R10: ffffa71a03c43dd0 R11: 0000000000000cb7 R12: 0000000000000000 Feb 21 18:42:33 fluorine kernel: [28025.197152] R13: 0000000000000000 R14: ffff8edc748f7000 R15: 0000000000000001 Feb 21 18:42:33 fluorine kernel: [28025.197154] FS: 0000000000000000(0000) GS:ffff8edc7fc00000(0000) knlGS:0000000000000000 Feb 21 18:42:33 fluorine kernel: [28025.197156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 21 18:42:33 fluorine kernel: [28025.197157] CR2: 00007f538a7cb000 CR3: 0000000153532000 CR4: 00000000001006f0 Feb 21 18:42:33 fluorine kernel: [28025.197159] Call Trace: Feb 21 18:42:33 fluorine kernel: [28025.197175] ? finish_task_switch+0x80/0x220 Feb 21 18:42:33 fluorine kernel: [28025.197180] ? wait_woken+0x80/0x80 Feb 21 18:42:33 fluorine kernel: [28025.197184] process_one_work+0x148/0x420 Feb 21 18:42:33 fluorine kernel: [28025.197188] worker_thread+0x47/0x420 Feb 21 18:42:33 fluorine kernel: [28025.197192] kthread+0xfc/0x130 Feb 21 18:42:33 fluorine kernel: [28025.197195] ? process_one_work+0x420/0x420 Feb 21 18:42:33 fluorine kernel: [28025.197197] ? kthread_create_on_node+0x40/0x40 Feb 21 18:42:33 fluorine kernel: [28025.197201] ret_from_fork+0x32/0x40 Feb 21 18:42:33 fluorine kernel: [28025.197205] Code: 24 50 4c 89 04 24 48 83 c7 08 e8 32 7b e1 cf 4c 8b 04 24 4d 85 c0 0f 85 08 fe ff ff 8d 75 41 48 c7 c7 40 fa 31 c0 e8 21 06 e3 cf <0f> ff e9 f2 fd ff ff 8d 70 41 48 c7 c7 10 fa 31 c0 e8 0b 06 e3 Feb 21 18:42:33 fluorine kernel: [28025.197257] ---[ end trace 826d9a59e8414bce ]---

Looping over an over. This eventually leads to a different error looping:

60111 Feb 21 09:52:47 fluorine kernel: [29027.550802] atmel_mxt_ts i2c-ATML0000:01: __mxt_read_reg: i2c transfer failed (-110) 60112 Feb 21 09:52:47 fluorine kernel: [29027.550811] atmel_mxt_ts i2c-ATML0000:01: Failed to read T44 and T5 (-110)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-367800541, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9feaxZvARjj_saAPZd0B9cgGDjuM3ks5tXcV-gaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

begati commented 6 years ago

Hey there @boutell, have you tested a large swap? got any success? My trackpad has not given more trouble since then. :)

boutell commented 6 years ago

I will as soon as I get home.

On Fri, Feb 23, 2018 at 12:17 PM, Evandro Begati notifications@github.com wrote:

Hey there @boutell https://github.com/boutell, have you tested a large swap? got any success? My trackpad has not given more trouble since then. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-368075825, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fcR62qNFYx629LUGEl7lJm2WOoTLks5tXvKWgaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

boutell commented 6 years ago

I have installed the swapfile this morning. I am hopeful...

I noticed that GalliumOS also comes with zram enabled as a swapfile of sorts. zram is interesting - it's a RAM-based compressed block device, so using it as swap is like having somewhat more memory than you really have without the performance impact of touching the disk. But, if it had a bug or some kind of timing issue that triggered a bug in the trackpad...

zram is higher priority than the swapfile. So I would assume the swapfile would never be touched unless zram were exhausted first. That could definitely happen though in the high load situations we've been talking about.

boutell commented 6 years ago

OK, I did some tests this morning with a 2GB swap file in place.

I did my best to mess it up with non-hardware things that have before: Facebooking (to the point where typing slowed down), YouTube videos continuously playing. No problems in about 45 minutes of use.

Then I plugged into a projector via HDMI. No problems while connected. Tried different display settings.

Then I unplugged the HDMI cable. Boom, pointer freeze.

I then checked syslog. In syslog I do see the following, however this started appearing periodically as soon as the HDMI cable was plugged in and didn't wait until later when I unplugged it, so I don't know if it's directly related:

pipe C vblank wait timed out

Followed by a stack trace etc. which I can attach if desired.

Those stopped happening after the HDMI cable was disconnected.

I looked at /proc/swaps a few times and while the use of /dev/zram0 climbed a little, I never noticed use of the swap file on disk.

I will keep testing because a 45 minute test is not really enough, but if I had only this to go on, I would say that the trackpad is a very "high-need" device that gets lost if the OS fails to pay attention to it even once, and the swap file ameliorates the problem by eliminating some scenarios where that happens, but the driver probably really needs to be at a realtime priority.

The HDMI disconnect situation is one where the kernel is especially likely to be thoroughly distracted by the needs of its other children for a moment, and can't pay attention to the trackpad at a critical juncture, and things go off the rails.

boutell commented 6 years ago

I experimented some more. I was not able to freeze the pointer simply by disconnecting HDMI, it hasn't happened again yet.

However, I did notice that I can get a similar vblank error message just by going fullscreen in youtube without a second monitor, at least half of the time:

pipe A vblank wait timed out

I've seen freezes associated with fullscreen in the past, so although it didn't freeze on this pass it does make me suspect I didn't imagine the connection.

boutell commented 6 years ago

I have reproduced the freeze in a "vblank wait timed out" situation (full screen apps, MAME, etc) with no external monitor. So far, I have not reproduced it in the absence of that message since adding swap.

boutell commented 6 years ago

@begati since adding your monster swapfile, have you done much with full screen applications and/or HDMI output? It doesn't happen 100% of the time, but so far those are the only circumstances in which I have experienced a pointer freeze since adding my own swapfile.

Obviously it would be good to fix those situations too...

Of course, with a low-frequency bug like this (since the 4.14.14 kernel), it's hard to be 100% sure I'll never see it again under other conditions.

boutell commented 6 years ago

My latest attempt to resolve this: I disabled zram (Gallium's default swap drive, a RAMdisk... which sounds crazy... except it's a compressed RAMdisk... so it's pretty sane). Pet theory is that there's a condition somewhere in which if the system swaps to zram it can't do something fast enough to suit the keyboard/trackpad hardware and things go south. Obviously swapping to disk should not be better, except that it's been around forever so it might say "hang on just a minit" in a better-debugged fashion.

Hey, it's an idea. I'll let you know.

boutell commented 6 years ago

No pointer freezes so far, but a youtube video fullscreen experience led me to more "vblank timed out" errors in syslog, after which I started to get some key duplication and video/audio got very choppy.

Others suggest the vblank messages can be associated with memory pressure:

https://bugs.freedesktop.org/show_bug.cgi?id=102667

boutell commented 6 years ago

Oooh! This looks really similar despite coming at it from a very different angle:

https://forums.puri.sm/t/is-anyone-else-experiencing-freezing-issues-with-librem-15-v3/1233/28

And I'm getting promising early results from this:

"I’m now running with intel_pstate=disable on kernel args and haven’t had the problem since."

Crossing my fingers this continues. If it has to do with powersave stuff kicking in badly that could explain the unpredictability very well.

boutell commented 6 years ago

Well.. at least I ruled some stuff out! I got a pointer freeze with zram and intel_pstate both disabled. Alas.

Same vblank stuff: pipe A vblank wait timed out

boutell commented 6 years ago

Continuing on this theme of turning other stuff off, rather than assuming the specific drivers for the trackpad/keyboard are at fault, I am very optimistic right now that it might be fixed (: Of course too soon to say for sure...

Here's what I did: I went into compton.conf and changed backend="glx"; to backend="xrender".

With this change, there are no vblank errors, video is much more stable, fullscreen and second monitor situations seem stable so far, and mame fills the second monitor properly (this one is not really relevant, but nice!)

In general it just feels like a happy machine that isn't getting clobbered by some deep issue that causes it to pause, chug and act up.

So my working theory is that opengl, or at least opengl with vsync enabled, is borked on CELES. Using xrender takes care of it, and I still seem to get good framerates; we watched Saturday Night Live with no issues and MAME framerates are great.

There could still be an issue. It's possible that the keyboard/trackpad driver just doesn't handle "hang on I can't help you right now" situations well and the vblank bug/problem/whatever with opengl was just making it really obvious. In which case I will eventually see a freeze anyway in some other situation. But... this looks an awful lot better so far.

Here's a summary of what I've done in total:

My working theory is that only the switch to xrender was actually necessary, but of course it's possible that all of these things are contributing to my happier outcome.

... So far. (: I'll let folks know when I've had this config for a week without freezes.

begati commented 6 years ago

Only creating a large 2gb swap partition was not enough to stop the problem.

I'm gonna try the other things you've said right now.

boutell commented 6 years ago

Cool! I recommend you start with switching compton to xrender, I suspect that might be the only important one, possibly with the addition of the 4.14.14 kernel if you haven't done that one already.

begati commented 6 years ago

Already running on 4.14.14 kernel, trying first switching compton to xrender.

boutell commented 6 years ago

.... Aaaaand I just experienced a pointer freeze out of nowhere. headdesk

Certainly the vblanks are gone, this has been well worth it.

Maybe this is related:

mmc0: Tuning timeout, falling back to fixed sampling clock

boutell commented 6 years ago

I bet that error is caused by putting swap on an SD card (mmc0). I am going to try disabling that swapfile and see if I do better with no swap at all or, if that's not practical, bringing back zram. It makes sense that an SD card is probably not the best place for realtime stuff.

rusty122 commented 6 years ago

I enabled xrender as the Compton backend and didn't make any other changes. Haven't seen any vblank timeouts and Youtube playback has much less stuttering but still getting pointer freezes (often during video playback) and haven't uncovered any useful log output.

boutell commented 6 years ago

Dang! Thanks for trying it.

On Mar 4, 2018 1:09 PM, "Russell Parker" notifications@github.com wrote:

I enabled xrender as the Compton backend and didn't make any other changes. Haven't seen any vblank timeouts and Youtube playback has much less stuttering but still getting pointer freezes (often during video playback) and haven't uncovered any useful log output.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-370250048, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fYsOYzjGxJN9eOnERtHoSWncmoHtks5tbC3JgaJpZM4Rvb5Q .

boutell commented 6 years ago

We should keep trying turning more stuff off but it may be that the issue really is with the trackpad or keyboard driver itself.

On Mar 4, 2018 1:26 PM, "Tom Boutell" tom@punkave.com wrote:

Dang! Thanks for trying it.

On Mar 4, 2018 1:09 PM, "Russell Parker" notifications@github.com wrote:

I enabled xrender as the Compton backend and didn't make any other changes. Haven't seen any vblank timeouts and Youtube playback has much less stuttering but still getting pointer freezes (often during video playback) and haven't uncovered any useful log output.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-370250048, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fYsOYzjGxJN9eOnERtHoSWncmoHtks5tbC3JgaJpZM4Rvb5Q .

begati commented 6 years ago

Moving compton to xrender, seems to be disabling vsync somehow, I'm experiencing some screen tearing on Chromium page scrooling!

boutell commented 6 years ago

Yes, that is to be expected, although I think there's a separate flag for vsync with xrender.

On Sun, Mar 4, 2018 at 5:33 PM, Evandro Begati notifications@github.com wrote:

Moving compton to xrender, seems to be disabling vsync somehow, I'm experiencing screen tearing!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-370270616, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fUVc2TpyUE4Jf4tc1MdkctZhIJXEks5tbGvWgaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

boutell commented 6 years ago

The default setting in the config file is for a vsync solution that is gl-specific so that doesn't get used in this configuration. Feel free to experiment and let us know if you can get vsync with xrender and no errors in the log.

On Sun, Mar 4, 2018 at 7:40 PM, Tom Boutell tom@punkave.com wrote:

Yes, that is to be expected, although I think there's a separate flag for vsync with xrender.

On Sun, Mar 4, 2018 at 5:33 PM, Evandro Begati notifications@github.com wrote:

Moving compton to xrender, seems to be disabling vsync somehow, I'm experiencing screen tearing!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-370270616, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fUVc2TpyUE4Jf4tc1MdkctZhIJXEks5tbGvWgaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

begati commented 6 years ago

I've tried to watch a 5 min HD video on youtube, got many screen stuttering and the trackpad went down... :/ (compton on xrender was set).

rusty122 commented 6 years ago

So I found some interesting logs in /var/log/Xorg.0.log.old after rebooting from my latest trackpad freeze. It's filled with repeats of this error:

(EE) ERROR:src/lookahead_filter_interpreter.cc:50:Can't accept new hwstate b/c we're out of nodes!
(EE) ERROR:src/lookahead_filter_interpreter.cc:51:Now: 2944.913988, interpreter_due_ -1.000000
(EE) ERROR:src/lookahead_filter_interpreter.cc:52:Dump of queue:
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.901988 (c)
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (c)
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.930988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988

The GalliumOS project actually has a copy of the project where the error is being raised: https://github.com/GalliumOS/libgestures

My guess is that libgestures is somehow getting flooded with hardware state events.

Update: Just got some new logs when a trackpad click was ignored.

(EE) ERROR:src/stuck_button_inhibitor_filter_interpreter.cc:68:Odd. result is sending buttons up for buttons we didn't send down: Existing down: 1. New up: 4.
(EE) ERROR:src/stuck_button_inhibitor_filter_interpreter.cc:61:Odd. result is sending buttons down that are already down: Existing down: 1. New down: 1. fixing
boutell commented 6 years ago

That IS interesting. It would be valuable to leave a tail -f of this log file open and see if these happen at the time of the freeze, or just later after the freeze (they could be aftereffects of something being stuck, but not necessarily the cause...)

On Mon, Mar 5, 2018 at 12:11 AM, Russell Parker notifications@github.com wrote:

So I found some interesting logs in /var/log/Xorg.0.log.old after rebooting from my latest trackpad freeze. It's filled with repeats of this error:

(EE) ERROR:src/lookahead_filter_interpreter.cc:50:Can't accept new hwstate b/c we're out of nodes! (EE) ERROR:src/lookahead_filter_interpreter.cc:51:Now: 2944.913988, interpreterdue -1.000000 (EE) ERROR:src/lookahead_filter_interpreter.cc:52:Dump of queue: (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.901988 (c) (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (c) (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.930988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 2944.913988

The GalliumOS project actually has a copy of the project where the error is being raised: https://github.com/GalliumOS/libgestures

My guess is that libgestures is somehow getting flooded with hardware state events.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-370311560, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fVXgSH3jmC98B7_63E2hWi2pJ6pDks5tbMj4gaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

jtecca commented 6 years ago

I haven't done any of the changes listed in this thread so that I can try to replicate the issue in a vanilla state.

I find that if I run Dwarf Fortress and a couple of terminals as well as a youtube video playing, I can usually get the behavior of the keyboard stuttering as well as the trackpad pointer freezing.

Tail following /var/log/X.org.0.log while I have a heavy load like what I described above will produce constant errors when the trackpad freezes and will continue to error out until the driver is reloaded:

(EE) ERROR:src/lookahead_filter_interpreter.cc:50:Can't accept new hwstate b/c we're out of nodes!
(EE) ERROR:src/lookahead_filter_interpreter.cc:51:Now: 31445.985548, interpreter_due_ -1.000000
(EE) ERROR:src/lookahead_filter_interpreter.cc:52:Dump of queue:
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.142797 (c)
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.209797
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.209797
(EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.192797

The errors will start again if I leave the load high and the trackpad pointer freezes again after reloading the driver. So there is definitely a correlation between these errors and the trackpad freezing.

boutell commented 6 years ago

Hey, this file sounds like it's not part of the driver... it's part of something called "gestures"!

And you know what gestures sound like to me?

They sound like something we can PROBABLY SHUT OFF!

I love shutting things off. (:

I'm trying to figure out what tool this comes from. It's not a file in the kernel.

Is it libgestures? What is that? How do we turn it off?

On Thu, Mar 8, 2018 at 9:09 PM, jeff tecca notifications@github.com wrote:

I haven't done any of the changes listed in this thread so that I can try to replicate the issue in a vanilla state.

I find that if I run Dwarf Fortress and a couple of terminals as well as a youtube video playing, I can usually get the behavior of the keyboard stuttering as well as the trackpad pointer freezing.

Tail following /var/log/X.org.0.log while I have a heavy load like what I described above will produce constant errors when the trackpad is frozen and will continue to error out until the driver is reloaded:

(EE) ERROR:src/lookahead_filter_interpreter.cc:50:Can't accept new hwstate b/c we're out of nodes! (EE) ERROR:src/lookahead_filter_interpreter.cc:51:Now: 31445.985548, interpreterdue -1.000000 (EE) ERROR:src/lookahead_filter_interpreter.cc:52:Dump of queue: (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.142797 (c) (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.209797 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.209797 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.192797

The errors will start again if I leave the load high and the trackpad pointer freezes again after reloading the driver.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-371690073, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fdjuGnkyhiFdj-850HAT3JLRDoYBks5tceRSgaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

boutell commented 6 years ago

@reynhout do you know what tool or library lookahead_filter_interpreter.cc is a part of?

This seems to be old chromiumos source, haven't found it in the new (?) github yet:

https://chromium.googlesource.com/chromiumos/platform/gestures/+/master/src/

Is this an optional gestures library or daemon that we might be able to turn off as a test? Would that make it impossible to click the trackpad or just limit fancier gestures?

On Fri, Mar 9, 2018 at 2:13 PM, Tom Boutell tom@punkave.com wrote:

Hey, this file sounds like it's not part of the driver... it's part of something called "gestures"!

And you know what gestures sound like to me?

They sound like something we can PROBABLY SHUT OFF!

I love shutting things off. (:

I'm trying to figure out what tool this comes from. It's not a file in the kernel.

Is it libgestures? What is that? How do we turn it off?

On Thu, Mar 8, 2018 at 9:09 PM, jeff tecca notifications@github.com wrote:

I haven't done any of the changes listed in this thread so that I can try to replicate the issue in a vanilla state.

I find that if I run Dwarf Fortress and a couple of terminals as well as a youtube video playing, I can usually get the behavior of the keyboard stuttering as well as the trackpad pointer freezing.

Tail following /var/log/X.org.0.log while I have a heavy load like what I described above will produce constant errors when the trackpad is frozen and will continue to error out until the driver is reloaded:

(EE) ERROR:src/lookahead_filter_interpreter.cc:50:Can't accept new hwstate b/c we're out of nodes! (EE) ERROR:src/lookahead_filter_interpreter.cc:51:Now: 31445.985548, interpreterdue -1.000000 (EE) ERROR:src/lookahead_filter_interpreter.cc:52:Dump of queue: (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.142797 (c) (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.209797 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.209797 (EE) ERROR:src/lookahead_filter_interpreter.cc:54:Due: 31438.192797

The errors will start again if I leave the load high and the trackpad pointer freezes again after reloading the driver.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-371690073, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fdjuGnkyhiFdj-850HAT3JLRDoYBks5tceRSgaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

rusty122 commented 6 years ago

libgestures is a dependency for xf86-input-cmt which is the trackpad driver (ported from Google's source code). You could eliminate both using the libinput driver but it probably won't be as tuned for Celes.

boutell commented 6 years ago

Maybe there is a way to shut off the gesturishness in configuration?

On Fri, Mar 9, 2018 at 2:20 PM, Russell Parker notifications@github.com wrote:

libgestures is a dependency for xf86-input-cmt which is the trackpad driver (ported from Google's source code). You could eliminate both using the libinput driver but it probably won't be as tuned for Celes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GalliumOS/galliumos-distro/issues/415#issuecomment-371918686, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB9fSWIymZS-bpr3dQmE_fS5SYzRDAnks5tctYFgaJpZM4Rvb5Q .

--

THOMAS BOUTELL, CHIEF SOFTWARE ARCHITECT P'UNK AVENUE | (215) 755-1330 | punkave.com

rusty122 commented 6 years ago

In /usr/share/X11/xorg.conf.d/50-touchpad-cmt-celes.conf I see a "Touchpad Stack Version" config. I'm downgrading it now to version 1 to see if things improve. There may be some stability issues hiding in the newer version.

boutell commented 6 years ago

@rusty122 I tried this this morning. I had three youtube streams going plus Facebook. Eventually, I got a lot of duplicate keystrokes, but no pointer freezes.

It's possible I just didn't wait long enough, obviously.

After the load went away, the keyboard duplication eventually did too.

My theory now is that there's a Third Thing that causes some kind of unpredictable behavior under heavy load, such as doubled or missing data on the bus, and different drivers misbehave differently in response. libgestures says "WTF?" and gets stuck, while the keyboard driver duplicates some characters but manages to continue.

boutell commented 6 years ago

Is anyone sure they've seen this behavior in the absence of sound? Including muted sound...

reynhout commented 6 years ago

@boutell Yep, it's from our libgestures fork, as @rusty122 noted. https://github.com/GalliumOS/libgestures ..

BTW, I don't have any insights to offer on this process, but I'm watching the progress on this ticket closely. I'll try to help if I am able, but I can at least promise that if a solution is found, I can get it packaged up for wider testing quickly. :)

boutell commented 6 years ago

@reynhout what was it that led you to believe the latest kernel would make the issue less frequent overall (which does seem to be true)? Anything specific we could look at there?

reynhout commented 6 years ago

@boutell Just a report on Reddit, I think.

We have a newer 4.14 kernel in early testing too. I have no reason to think this will help, but it's where the code is headed, at least: https://galliumos.org/tmp/kernels/4.14/linux-image-4.14.24-galliumos_4.14.24-galliumos0+dev1_amd64.deb .. I'm also hoping to get a 4.15 build this weekend.

boutell commented 6 years ago

Duplicate keystroke bugs have a long and intriguing history. This is an old thread, but interesting for the variety of workarounds and causes: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/39315

boutell commented 6 years ago

I tried limiting the maximum CPU speed to 1000mhz, which is pretty drastic. This did not help - I still got a pointer freeze eventually.

I really wanted to test what would happen if I disabled CPU speed changes entirely, but I have not been able to pull that off because even when I disable intel_pstate I can't seem to enable the "userspace" governor (which is necessary to be able to set a fixed speed).

Not that a fixed speed would be good. That would be pretty awful.

boutell commented 6 years ago

Latest idea: I uninstalled upower. No battery indicator! Livin' on the edge!

Eventually I did get a pointer freeze, but the logs are interesting:

Mar 13 06:29:28 iloverocks kernel: [ 3945.346779] atmel_mxt_ts i2c-ATML0000:00: __mxt_read_reg: i2c transfer failed (-110) Mar 13 06:29:28 iloverocks kernel: [ 3945.346807] atmel_mxt_ts i2c-ATML0000:00: Failed to read T44 and T5 (-110) Mar 13 06:29:28 iloverocks kernel: [ 3945.369012] i2c_designware 808622C1:05: timeout waiting for bus ready Mar 13 06:29:28 iloverocks kernel: [ 3945.369023] atmel_mxt_ts i2c-ATML0000:00: __mxt_read_reg: i2c transfer failed (-110) Mar 13 06:29:28 iloverocks kernel: [ 3945.369030] atmel_mxt_ts i2c-ATML0000:00: Failed to read T44 and T5 (-110) Mar 13 06:29:28 iloverocks kernel: [ 3945.391293] i2c_designware 808622C1:05: timeout waiting for bus ready