ValveSoftware / Source-1-Games

Source 1 based games such as TF2 and Counter-Strike: Source
647 stars 75 forks source link

[TF2] Frequent crashing/lockup of TF2 and X11 - Linux Mint 21.2 #5324

Closed rtaft closed 6 months ago

rtaft commented 11 months ago

When playing TF2 I run into an issue where one of my 2 monitors will turn a solid color. TF2 will still be running but eventually, both monitors will turn a solid color (the color varies). The game sound will continue to play, often looping sounds before it eventually stops. I am able to SSH into the PC and what I can see is the hlds_linux process using 100% CPU. Eventually, the process will crash entirely, when this happens, the Xorg process then has 100% CPU usage. I have not figured out how to recover the system after this without rebooting, killing Xorg does not stop it. These crashes happen about once or twice a week with 1-2 hours of gameplay per day. Halloween maps trigger this more frequently. My overall memory usage for today's crash was at 4GB out of 32GB. GPU temp was around 70C after SSHing into the machine, I've never seen it over 85C.

I started having this issue after upgrading from Mint 20 to Mint 21. I bought a new nvme SSD and installed Mint 21.1 on it, upgrading to 21.2 when it came out. Initially, the crashes were just TF2, but evolved into full OS lockups, one time I could not power on the PC afterwards. I figured it was a dying MB so I upgraded from a 4790k to a i7-13700, new memory, new MB. This did not resolve the issue. Since this started with the new nvme disk and install of Mint 21.2 (copied game files), I got another nvme drive, fresh install of 21.2, and fresh downloads of everything. I had 2 crashes in 6 days. I switched from Cinnamon to Mate, still crashes. I tried switching NVidia drivers from 525 to 390 as well.

I reverted back to booting to my Mint 20.3 disk, I had to upgrade the kernel to support the new hardware (5.4->5.15) After over a week of play, I've had no crashes. This all started with Mint 21. Any advice on how to track this down and resolve it?

EDIT: I had left top running and reset the machine. I noticed afterward that gpu-manager was at 100% CPU as root and gameoverlayui using 3% at the moment I reset.

System:
  Host: desktop Kernel: 5.15.0-84-generic x86_64 bits: 64 compiler: gcc v: 11.4.0
    Console: pty pts/2 DM: LightDM 1.30.0 Distro: Linux Mint 21.2 Victoria base: Ubuntu 22.04 jammy
Machine:
  Type: Desktop Mobo: ASRock model: Z790 Steel Legend WiFi serial: M80-G6002900275
    UEFI: American Megatrends LLC. v: 8.05 date: 05/18/2023
Battery:
  Message: No system battery data found. Is one present?
Memory:
  RAM: total: 31.17 GiB used: 6.18 GiB (19.8%)
  Array-1: capacity: 128 GiB slots: 4 EC: None max-module-size: 32 GiB note: est.
  Device-1: Controller0-ChannelA-DIMM0 size: 16 GiB speed: 5600 MT/s type: DDR5
    detail: synchronous bus-width: 64 bits total: 64 bits manufacturer: Crucial Technology
    part-no: CP16G56C46U5.M8G1 serial: E80DB430
  Device-2: Controller0-ChannelA-DIMM1 size: 16 GiB speed: 5600 MT/s type: DDR5
    detail: synchronous bus-width: 64 bits total: 64 bits manufacturer: Crucial Technology
    part-no: CP16G56C46U5.M8G1 serial: E80DB41E
  Device-3: Controller1-ChannelA-DIMM0 size: No Module Installed
  Device-4: Controller1-ChannelA-DIMM1 size: No Module Installed
CPU:
  Info: 16-core (8-mt/8-st) model: 13th Gen Intel Core i7-13700 bits: 64 type: MST AMCP
    smt: enabled arch: N/A rev: 1 cache: L1: 1.4 MiB L2: 24 MiB L3: 30 MiB
  Speed (MHz): avg: 999 high: 1101 min/max: 800/5100:5200:4100 volts: 0.7 V ext-clock: 100 MHz
    cores: 1: 1100 2: 1100 3: 1100 4: 1101 5: 1100 6: 1099 7: 1100 8: 1100 9: 1100 10: 1101
    11: 1100 12: 1099 13: 1100 14: 1101 15: 1100 16: 1100 17: 800 18: 800 19: 800 20: 800 21: 800
    22: 799 23: 799 24: 799 bogomips: 101376
  Flags: 3dnowprefetch abm acpi adx aes aperfmperf apic arat arch_capabilities arch_lbr
    arch_perfmon art avx avx2 avx_vnni bmi1 bmi2 bts clflush clflushopt clwb cmov constant_tsc
    cpuid cpuid_fault cx16 cx8 de ds_cpl dtes64 dtherm dts epb ept ept_ad erms est f16c
    flexpriority flush_l1d fma fpu fsgsbase fsrm fxsr gfni ht hwp hwp_act_window hwp_epp
    hwp_notify hwp_pkg_req ibpb ibrs ibrs_enhanced ida intel_pt invpcid lahf_lm lm mca mce
    md_clear mmx monitor movbe movdir64b movdiri msr mtrr nonstop_tsc nopl nx ospke pae pat pbe
    pclmulqdq pconfig pdcm pdpe1gb pebs pge pku pln pni popcnt pse pse36 pts rdpid rdrand rdseed
    rdtscp rep_good sdbg sep serialize sha_ni smap smep smx split_lock_detect ss ssbd sse sse2
    sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tme tpr_shadow tsc tsc_adjust tsc_deadline_timer
    tsc_known_freq umip vaes vme vmx vnmi vpclmulqdq vpid waitpkg x2apic xgetbv1 xsave xsavec
    xsaveopt xsaves xtopology xtpr
Graphics:
  Device-1: NVIDIA GM200 [GeForce GTX TITAN X] vendor: eVga.com. driver: nvidia v: 390.157 pcie:
    speed: 8 GT/s lanes: 4 bus-ID: 06:00.0 chip-ID: 10de:17c2 class-ID: 0300
  Display: server: X.org v: 1.21.1.4 with: Xwayland v: 22.1.1 driver: X: loaded: nvidia
    unloaded: fbdev,modesetting,nouveau,vesa gpu: nvidia tty: 334x75
  Message: GL data unavailable in console for root.
Audio:
  Device-1: Intel vendor: ASRock driver: snd_hda_intel v: kernel bus-ID: 00:1f.3
    chip-ID: 8086:7a50 class-ID: 0403
  Device-2: NVIDIA GM200 High Definition Audio vendor: eVga.com. driver: snd_hda_intel
    v: kernel pcie: speed: 8 GT/s lanes: 4 bus-ID: 06:00.1 chip-ID: 10de:0fb0 class-ID: 0403
  Device-3: Corsair VOID PRO Wireless Gaming Headset type: USB
    driver: hid-generic,snd-usb-audio,usbhid bus-ID: 1-9.1.2:10 chip-ID: 1b1c:0a16 class-ID: 0300
  Sound Server-1: ALSA v: k5.15.0-84-generic running: yes
  Sound Server-2: PulseAudio v: 15.99.1 running: yes
  Sound Server-3: PipeWire v: 0.3.48 running: yes
Use of uninitialized value $args in concatenation (.) or string at /usr/bin/inxi line 2584.
Use of uninitialized value in concatenation (.) or string at /usr/bin/inxi line 2584.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    13  100    13    0     0    146      0 --:--:-- --:--:-- --:--:--   147
Network:
  Device-1: Realtek RTL8125 2.5GbE vendor: ASRock driver: r8169 v: kernel pcie: speed: 5 GT/s
    lanes: 1 port: 4000 bus-ID: 03:00.0 chip-ID: 10ec:8125 class-ID: 0200
  IF: enp3s0 state: up speed: 1000 Mbps duplex: full mac: 9c:6b:00:24:2e:4e
  IP v4: 192.168.1.2/24 type: noprefixroute scope: global broadcast: 192.168.1.255
  IP v6: fe80::1847:6497:3155:135f/64 type: noprefixroute scope: link
  Device-2: Intel Wi-Fi 6 AX210/AX211/AX411 160MHz vendor: Rivet Networks driver: iwlwifi
    v: kernel pcie: speed: 5 GT/s lanes: 1 bus-ID: 04:00.0 chip-ID: 8086:2725 class-ID: 0280
  IF: wlp4s0 state: down mac: 54:14:f3:d2:40:88
  IF-ID-1: br-960887c7714c state: up speed: 10000 Mbps duplex: unknown mac: 02:42:3a:b8:24:f5
  IP v4: 172.21.0.1/16 scope: global broadcast: 172.21.255.255
  IP v6: fe80::42:3aff:feb8:24f5/64 scope: link
  IF-ID-2: docker0 state: down mac: 02:42:91:f8:d2:7a
  IP v4: 172.17.0.1/16 scope: global broadcast: 172.17.255.255
  IF-ID-3: veth31b1b8a state: up speed: 10000 Mbps duplex: full mac: fe:1a:bd:77:93:81
  IF-ID-4: veth37a16a3 state: up speed: 10000 Mbps duplex: full mac: a6:f8:ea:b1:fc:36
  IF-ID-5: vethb881f82 state: up speed: 10000 Mbps duplex: full mac: be:38:90:32:79:49
  WAN IP: 68.172.52.184
Bluetooth:
  Device-1: Intel AX210 Bluetooth type: USB driver: btusb v: 0.8 bus-ID: 1-14:7
    chip-ID: 8087:0032 class-ID: e001
  Report: hciconfig ID: hci0 rfk-id: 0 state: up address: 54:14:F3:D2:40:8C
Logical:
  Message: No logical block device data found.
RAID:
  Message: No RAID data found.
Drives:
  Local Storage: total: 34.11 TiB used: 28.7 TiB (84.1%)
  ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO Plus 500GB size: 465.76 GiB
    speed: 31.6 Gb/s lanes: 4 type: SSD serial: S58SNM0W311437R rev: 2B2QEXM7 temp: 51.9 C
    scheme: GPT
  ID-2: /dev/nvme1n1 vendor: Western Digital model: WD BLACK SN750 2TB size: 1.82 TiB
    speed: 31.6 Gb/s lanes: 4 type: SSD serial: 224231800154 rev: 112000WD temp: 56.9 C scheme: GPT
  ID-3: /dev/sda vendor: Western Digital model: WD120EMAZ-11BLFA0 size: 10.91 TiB
    speed: 6.0 Gb/s type: HDD rpm: 5400 serial: 5PGJSLHC rev: 0A81
  ID-4: /dev/sdb vendor: Seagate model: ST8000DM004-2CX188 size: 7.28 TiB speed: 6.0 Gb/s
    type: HDD rpm: 5425 serial: WCT072F3 rev: 0001
  ID-5: /dev/sdc vendor: Western Digital model: WD20EARX-00PASB0 size: 1.82 TiB speed: 6.0 Gb/s
    type: N/A serial: WD-WCAZAH803281 rev: AB51
  ID-6: /dev/sdd vendor: Seagate model: ST8000AS0002-1NA17Z size: 7.28 TiB speed: 6.0 Gb/s
    type: HDD rpm: 5980 serial: Z840SWHS rev: RT17
  ID-7: /dev/sde vendor: Seagate model: ST5000DM000-1FK178 size: 4.55 TiB speed: 6.0 Gb/s
    type: HDD rpm: 5980 serial: W4J17RDJ rev: CC49
  Optical-1: /dev/sr0 vendor: ASUS model: BW-12B1ST a rev: 1.00 dev-links: cdrom
  Features: speed: 204 multisession: yes audio: yes dvd: yes rw: cd-r,cd-rw,dvd-r,dvd-ram
    state: running
Partition:
  ID-1: / size: 1.79 TiB used: 289.54 GiB (15.8%) fs: ext4 dev: /dev/nvme1n1p2 label: N/A
    uuid: 607205af-15d6-4cd1-8b67-96c8275768b2
  ID-2: /boot/efi size: 475.1 MiB used: 31.5 MiB (6.6%) fs: vfat dev: /dev/nvme1n1p1 label: N/A
    uuid: 9645-3E2A
  ID-3: /media/Backup5TB size: 4.55 TiB used: 2 TiB (44.0%) fs: xfs dev: /dev/sde label: N/A
    uuid: 159aa8cb-c6bc-4cf9-9cb1-18a789ef1690
  ID-4: /media/Disk1 size: 7.28 TiB used: 7.27 TiB (99.9%) fs: xfs dev: /dev/sdb label: N/A
    uuid: 4a25fdc9-0a94-4815-9477-0c18960350ee
  ID-5: /media/Disk12TB size: 10.91 TiB used: 10.75 TiB (98.5%) fs: xfs dev: /dev/sda
    label: N/A uuid: b75aa4bf-853c-444c-8d34-2a50f43c9ed2
  ID-6: /media/Disk2 size: 1.79 TiB used: 1.4 TiB (78.0%) fs: ext4 dev: /dev/sdc label: Disk2
    uuid: 76d38a67-f1dd-41e9-ab75-8d4ccc760123
  ID-7: /media/Disk8 size: 7.28 TiB used: 7 TiB (96.2%) fs: xfs dev: /dev/sdd label: N/A
    uuid: 17c7e9d6-7293-46d5-87bf-d2b0696c75e6
Swap:
  ID-1: swap-1 type: file size: 2 GiB used: 0 KiB (0.0%) priority: -2 file: /swapfile
Unmounted:
  ID-1: /dev/nvme0n1p1 size: 16 MiB fs: N/A label: N/A uuid: N/A
  ID-2: /dev/nvme0n1p2 size: 465.75 GiB fs: ntfs label: N/A uuid: FE869FEA869FA227
USB:
  Hub-1: 1-0:1 info: Hi-speed hub with single TT ports: 16 rev: 2.0 speed: 480 Mb/s
    chip-ID: 1d6b:0002 class-ID: 0900
  Device-1: 1-5:2 info: Razer USA DeathAdder Elite type: Mouse,Keyboard
    driver: hid-generic,usbhid interfaces: 3 rev: 2.0 speed: 12 Mb/s power: 500mA
    chip-ID: 1532:005c class-ID: 0300
  Hub-2: 1-8:3 info: ASMedia ASM1074 High-Speed hub ports: 4 rev: 2.1 speed: 480 Mb/s
    power: 100mA chip-ID: 174c:2074 class-ID: 0900
  Hub-3: 1-9:4 info: ASMedia ASM1074 High-Speed hub ports: 4 rev: 2.1 speed: 480 Mb/s
    power: 100mA chip-ID: 174c:2074 class-ID: 0900
  Hub-4: 1-9.1:6 info: Dell Keyboard Hub ports: 3 rev: 1.1 speed: 12 Mb/s power: 100mA
    chip-ID: 413c:1003 class-ID: 0900
  Device-1: 1-9.1.1:9 info: Dell Keyboard type: Keyboard,HID driver: hid-generic,usbhid
    interfaces: 2 rev: 1.1 speed: 12 Mb/s power: 50mA chip-ID: 413c:2010 class-ID: 0300
  Device-2: 1-9.1.2:10 info: Corsair VOID PRO Wireless Gaming Headset type: Audio,HID
    driver: hid-generic,snd-usb-audio,usbhid interfaces: 4 rev: 1.1 speed: 12 Mb/s power: 100mA
    chip-ID: 1b1c:0a16 class-ID: 0300
  Device-3: 1-9.2:8 info: American Power Conversion Uninterruptible Supply type: HID
    driver: hid-generic,usbhid interfaces: 1 rev: 2.0 speed: 12 Mb/s power: 2mA chip-ID: 051d:0002
    class-ID: 0300 serial: 3B1815X23428
  Device-4: 1-13:5 info: ASRock LED Controller type: HID driver: hid-generic,usbhid
    interfaces: 1 rev: 1.1 speed: 12 Mb/s power: 100mA chip-ID: 26ce:01a2 class-ID: 0300
    serial: A02019100900
  Device-5: 1-14:7 info: Intel AX210 Bluetooth type: Bluetooth driver: btusb interfaces: 2
    rev: 2.0 speed: 12 Mb/s power: 100mA chip-ID: 8087:0032 class-ID: e001
  Hub-5: 2-0:1 info: Super-speed hub ports: 9 rev: 3.1 speed: 20 Gb/s chip-ID: 1d6b:0003
    class-ID: 0900
  Hub-6: 2-8:2 info: ASMedia ASM1074 SuperSpeed hub ports: 4 rev: 3.0 speed: 5 Gb/s power: 8mA
    chip-ID: 174c:3074 class-ID: 0900
  Hub-7: 2-9:3 info: ASMedia ASM1074 SuperSpeed hub ports: 4 rev: 3.0 speed: 5 Gb/s power: 8mA
    chip-ID: 174c:3074 class-ID: 0900
Sensors:
  System Temperatures: cpu: 32.0 C mobo: 50.5 C
  Fan Speeds (RPM): fan-1: 0 fan-2: 0 fan-3: 0 fan-4: 397 fan-5: 0 fan-6: 0 fan-7: 730
Info:
  Processes: 560 Uptime: 1h 51m wakeups: 0 Init: systemd v: 249 runlevel: 5 Compilers:
  gcc: 11.4.0 alt: 11/12 Packages: apt: 2733 Shell: Sudo (sudo) v: 1.9.9 default: Bash v: 5.1.16
  running-in: pty pts/2 (SSH) inxi: 3.3.13
notkiniro commented 11 months ago

This is a known issue with multiple monitors try adding -windowed to your parameters

rtaft commented 11 months ago

@notkiniro No luck with windowed mode, I only survived one map before it locked up on Mint 21.2. I had my doubts though, never had this issue before in full-screen mode. I'm trying out Ubuntu Cinnamon 23.10 as well (multiple ssds), I got into a repeating loop for about 5 seconds and thought it was going down but it recovered. I've only got about 5 hours of game time on that OS. My goal is to see if it is just Mint 21 or if it's on the latest Ubuntu or even Ubuntu 22. The only other hardware I could swap out is the PSU and GPU. I could throw in an older AMD (R9 290) card and see if that clears it up. I'm open to any other suggestions.

rtaft commented 6 months ago

After much testing, this issue also happened in Windows. The difference was that Windows seemed to recover after a while, while multiple versions of Linux would lock up X11. I have not seen this behavior since I upgraded my video card, it is possible the issue is directly related to the card itself or the drivers.