Ai00-X / ai00_server

The all-in-one RWKV runtime box with embed, RAG, AI agents, and more.
https://ai00-x.github.io/ai00_server/
MIT License
476 stars 58 forks source link

load a model to a explizit GPU #35

Closed chymian closed 1 year ago

chymian commented 1 year ago

starting the server by hand and choose one of the vulkan GPUs works well – actually very well! kudos 4 u!

in auto mode, it tries to load the model to the first reported (integrated) GPU (here: haswell/intel celeron) . as it looks like, from watching radeontop, it then starts to offload to GPU, which breaks the system. see kernel log below.

further down the road RWKV-Runner acts the same. it's a bit more friendly in stopping to load the model earlier an stays responsive, but also cannot load a 7B model to a GPU.

suggestion: implement a solution so that one can load the model to a discrete GPU i.e. -a \:\ -a vulkan:01

or skip to load to GPU0, if it is integrated and if there are discrete GPUs available.

Loadorder:

 ./ai00_server --port 8082 --quant 32  --model assets/models/RWKV-4-World-ARAtuned-7B-v1-20230803-ctx4096.stMESA-INTEL: warning: Haswell Vulkan support is incomplete
? Please select an adapter ›
❯ Intel(R) HD Graphics (HSW GT1) (Vulkan)
  Radeon RX 580 Series (Vulkan)
  Radeon RX 580 Series (Vulkan)
  Radeon RX 580 Series (Vulkan)
  Radeon RX 580 Series (Vulkan)

vulkaninfo:

Devices:
========
GPU0:
        apiVersion         = 1.2.230
        driverVersion      = 22.3.6
        vendorID           = 0x8086
        deviceID           = 0x0402
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = Intel(R) HD Graphics (HSW GT1)
        driverID           = DRIVER_ID_INTEL_OPEN_SOURCE_MESA
        driverName         = Intel open-source Mesa driver
        driverInfo         = Mesa 22.3.6
        conformanceVersion = 0.0.0.0
        deviceUUID         = 0f2f1a2f-fc30-f647-758e-bed37906cc4d
        driverUUID         = da807cc5-e5c9-2add-5541-8357feabd0cc
GPU1:
        apiVersion         = 1.3.240
        driverVersion      = 2.0.255
        vendorID           = 0x1002
        deviceID           = 0x67df
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = Radeon RX 580 Series
        driverID           = DRIVER_ID_AMD_OPEN_SOURCE
        driverName         = AMD open-source driver
        driverInfo         = 2023.Q1.2 (LLPC)
        conformanceVersion = 1.3.0.0
        deviceUUID         = 00000000-0100-0000-0000-000000000000
        driverUUID         = 414d442d-4c49-4e55-582d-445256000000
GPU2 - GPU4
…

system: is an old crypto RIG with celeron, 8GB & 4 x RX580 (8GB) OS: debian 12 kernel: Linux jeeves 6.1.0-11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux vulkan driver: amdgpu (opensource)

versions:

libegl-mesa0:amd64               22.3.6-1+deb12u1
libgl1-mesa-dri:amd64            22.3.6-1+deb12u1
libglapi-mesa:amd64              22.3.6-1+deb12u1
libglu1-mesa:amd64               9.0.2-1.1
libglx-mesa0:amd64               22.3.6-1+deb12u1
libvulkan1:amd64                 1.3.239.0-1
mesa-common-dev:amd64            22.3.6-1+deb12u1
mesa-opencl-icd:amd64            22.3.6-1+deb12u1
mesa-va-drivers:amd64            22.3.6-1+deb12u1
mesa-vdpau-drivers:amd64         22.3.6-1+deb12u1
mesa-vulkan-drivers:amd64        22.3.6-1+deb12u1
vulkan-amdgpu:amd64              23.10-1620044.22.04
vulkan-tools                     1.3.239.0+dfsg1-1
vulkan-validationlayers:amd64    1.3.239.0-

kernellog:

Sep 07 18:18:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 65535
Sep 07 18:18:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 65535
Sep 07 18:18:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 65535
Sep 07 18:18:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 65535
Sep 07 18:18:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=11015, emitted seq=11017
Sep 07 18:18:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:18:43 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset begin!
Sep 07 18:18:43 jeeves kernel: amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Sep 07 18:18:43 jeeves kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Sep 07 18:18:44 jeeves kernel: amdgpu: cp is busy, skip halt cp
Sep 07 18:18:44 jeeves kernel: amdgpu: rlc is busy, skip halt rlc
Sep 07 18:18:44 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: BACO reset
Sep 07 18:18:44 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 07 18:18:44 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:18:44 jeeves kernel: [drm] VRAM is lost due to GPU reset!
Sep 07 18:18:51 jeeves kernel: perf: interrupt took too long (3146 > 3142), lowering kernel.perf_event_max_sample_rate to 63500
Sep 07 18:18:54 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:00 jeeves kernel: amdgpu: SMU Firmware start failed!
Sep 07 18:19:00 jeeves kernel: amdgpu: Failed to load SMU ucode.
Sep 07 18:19:00 jeeves kernel: amdgpu: fw load failed
Sep 07 18:19:00 jeeves kernel: amdgpu: smu firmware loading failed
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset(1) failed
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:19:00 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset end with ret = -22
Sep 07 18:19:00 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -22
Sep 07 18:19:03 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:09 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:16 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:22 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:28 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:29 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=11017, emitted seq=11018
Sep 07 18:19:29 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:19:29 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: GPU reset begin!
Sep 07 18:19:37 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:47 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:19:53 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:01 jeeves CRON[136318]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 07 18:20:01 jeeves CRON[136319]: (root) CMD (/bin/ping -qi 10 -c 3 -I zt4mrrjgxa 10.11.1.1 >/dev/null || /usr/sbin/service zerotier-one restart)
Sep 07 18:20:02 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:06 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=871, emitted seq=871
Sep 07 18:20:06 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:20:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Sep 07 18:20:12 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:18 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:26 jeeves CRON[136318]: pam_unix(cron:session): session closed for user root
Sep 07 18:20:47 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:47 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:47 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:49 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:20:59 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:05 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:14 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:27 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: Guilty job already signaled, skipping HW reset
Sep 07 18:21:27 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset(1) succeeded!
Sep 07 18:21:27 jeeves kernel: kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
Sep 07 18:21:30 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:39 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:42 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Sep 07 18:21:48 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:21:52 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=36, emitted seq=39
Sep 07 18:21:52 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ai00_server pid 134867 thread ai00_server pid 134867
Sep 07 18:21:52 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Sep 07 18:21:53 jeeves kernel: amdgpu: cp is busy, skip halt cp
Sep 07 18:21:53 jeeves kernel: amdgpu: rlc is busy, skip halt rlc
Sep 07 18:21:53 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: BACO reset
Sep 07 18:22:00 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:00 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 07 18:22:00 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:22:00 jeeves kernel: [drm] VRAM is lost due to GPU reset!
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:22 jeeves kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Sep 07 18:22:22 jeeves kernel:         (detected by 1, t=5252 jiffies, g=1043809, q=784 ncpus=2)
Sep 07 18:22:22 jeeves kernel: rcu: All QSes seen, last rcu_preempt kthread activity 5240 (4296071681-4296066441), jiffies_till_next_fqs=1, root ->qsmask 0x0
Sep 07 18:22:22 jeeves kernel: rcu: rcu_preempt kthread starved for 5240 jiffies! g1043809 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
Sep 07 18:22:22 jeeves kernel: rcu:         Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
Sep 07 18:22:22 jeeves kernel: rcu: RCU grace-period kthread stack dump:
Sep 07 18:22:22 jeeves kernel: task:rcu_preempt     state:R  running task     stack:0     pid:15    ppid:2      flags:0x00004000
Sep 07 18:22:22 jeeves kernel: Call Trace:
Sep 07 18:22:22 jeeves kernel:  <TASK>
Sep 07 18:22:22 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:22:22 jeeves kernel:  ? rcu_gp_cleanup+0x480/0x480
Sep 07 18:22:22 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:22:22 jeeves kernel:  schedule_timeout+0x94/0x150
Sep 07 18:22:22 jeeves kernel:  ? __bpf_trace_tick_stop+0x10/0x10
Sep 07 18:22:22 jeeves kernel:  rcu_gp_fqs_loop+0x141/0x4c0
Sep 07 18:22:22 jeeves kernel:  rcu_gp_kthread+0xd0/0x190
Sep 07 18:22:22 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:22:22 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:22:22 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:22:22 jeeves kernel:  </TASK>
Sep 07 18:22:22 jeeves kernel: rcu: Stack dump where RCU GP kthread last ran:
Sep 07 18:22:22 jeeves kernel: CPU: 1 PID: 133662 Comm: kworker/u4:7 Tainted: G          I        6.1.0-11-amd64 #1  Debian 6.1.38-4
Sep 07 18:22:22 jeeves kernel: Hardware name: BIOSTAR Group TB85/TB85, BIOS 4.6.5 08/22/2017
Sep 07 18:22:22 jeeves kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Sep 07 18:22:22 jeeves kernel: RIP: 0010:amdgpu_device_rreg.part.0+0x2f/0xe0 [amdgpu]
Sep 07 18:22:22 jeeves kernel: Code: 41 54 44 8d 24 b5 00 00 00 00 55 89 f5 53 48 89 fb 4c 3b a7 b8 08 00 00 73 62 83 e2 02 74 21 4c 03 a3 c0 08 00 00 45 8b 24 24 <48> 8b 43 08 0f b7 70 3e 66 90 44 89 e0 5b 5d 41 5c c3 cc cc cc cc
Sep 07 18:22:22 jeeves kernel: RSP: 0018:ffffbe9b430e7b68 EFLAGS: 00000282
Sep 07 18:22:22 jeeves kernel: RAX: ffffffffc0f56c80 RBX: ffff9be629d40000 RCX: 0000000000000000
Sep 07 18:22:22 jeeves kernel: RDX: 0000000000000000 RSI: 0000000000000095 RDI: ffff9be629d40000
Sep 07 18:22:22 jeeves kernel: RBP: 0000000000000095 R08: 0000000000000000 R09: ffffbe9b430e7948
Sep 07 18:22:22 jeeves kernel: R10: 0000000000000003 R11: ffffffffbbcd43a8 R12: 0000000000000000
Sep 07 18:22:22 jeeves kernel: R13: 0000000000000000 R14: 000000000000ffff R15: 0000000000000000
Sep 07 18:22:22 jeeves kernel: FS:  0000000000000000(0000) GS:ffff9be786b00000(0000) knlGS:0000000000000000
Sep 07 18:22:22 jeeves kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 07 18:22:22 jeeves kernel: CR2: 00007faa31959000 CR3: 0000000106522002 CR4: 00000000000706e0
Sep 07 18:22:22 jeeves kernel: Call Trace:
Sep 07 18:22:22 jeeves kernel:  <IRQ>
Sep 07 18:22:22 jeeves kernel:  ? rcu_check_gp_kthread_starvation.cold+0x16c/0x171
Sep 07 18:22:22 jeeves kernel:  ? rcu_sched_clock_irq+0xc9c/0xcd0
Sep 07 18:22:22 jeeves kernel:  ? raw_notifier_call_chain+0x44/0x60
Sep 07 18:22:22 jeeves kernel:  ? update_process_times+0x77/0xb0
Sep 07 18:22:22 jeeves kernel:  ? tick_sched_handle+0x22/0x60
Sep 07 18:22:22 jeeves kernel:  ? tick_sched_timer+0x6f/0x80
Sep 07 18:22:22 jeeves kernel:  ? tick_sched_do_timer+0xa0/0xa0
Sep 07 18:22:22 jeeves kernel:  ? __hrtimer_run_queues+0x112/0x2b0
Sep 07 18:22:22 jeeves kernel:  ? hrtimer_interrupt+0xfe/0x220
Sep 07 18:22:22 jeeves kernel:  ? __sysvec_apic_timer_interrupt+0x7f/0x170
Sep 07 18:22:22 jeeves kernel:  ? sysvec_apic_timer_interrupt+0x99/0xc0
Sep 07 18:22:22 jeeves kernel:  </IRQ>
Sep 07 18:22:22 jeeves kernel:  <TASK>
Sep 07 18:22:22 jeeves kernel:  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
Sep 07 18:22:22 jeeves kernel:  ? amdgpu_cgs_write_register+0x10/0x10 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  ? amdgpu_device_rreg.part.0+0x2f/0xe0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  phm_wait_for_register_unequal+0x5e/0xa0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  smu7_send_msg_to_smc+0x91/0x140 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  smum_send_msg_to_smc_with_parameter+0xc7/0x100 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  smu7_update_clock_gatings+0x2c4/0x3f0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  pp_set_clockgating_by_smu+0x35/0x70 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_dpm_set_clockgating_by_smu+0x4d/0x70 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  vi_common_set_clockgating_state+0x19d/0x310 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_set_cg_state+0x92/0xf0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  ? __irq_put_desc_unlock+0x18/0x40
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x27/0xe0 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_ip_suspend+0x1b/0x70 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_pre_asic_reset+0xcf/0x290 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_device_gpu_recover.cold+0x607/0xad4 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  amdgpu_job_timedout+0x1d8/0x220 [amdgpu]
Sep 07 18:22:22 jeeves kernel:  ? psi_group_change+0x145/0x360
Sep 07 18:22:22 jeeves kernel:  ? __switch_to+0x228/0x410
Sep 07 18:22:22 jeeves kernel:  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
Sep 07 18:22:22 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:22:22 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:22:22 jeeves kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Sep 07 18:22:22 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:22:22 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:22:22 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:22:22 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:22:22 jeeves kernel:  </TASK>
Sep 07 18:22:22 jeeves kernel: amdgpu: SMU Firmware start failed!
Sep 07 18:22:22 jeeves kernel: amdgpu: Failed to load SMU ucode.
Sep 07 18:22:22 jeeves kernel: amdgpu: fw load failed
Sep 07 18:22:22 jeeves kernel: amdgpu: smu firmware loading failed
Sep 07 18:22:22 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:22 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:22 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset(4) failed
Sep 07 18:22:22 jeeves kernel: kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
Sep 07 18:22:22 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:22 jeeves kernel: [drm] Skip scheduling IBs!
SSep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
…
around 800 x the same logentry
…
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=241, emitted seq=241
Sep 07 18:22:31 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:22:31 jeeves kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
Sep 07 18:22:31 jeeves kernel: [drm] Skip scheduling IBs!
Sep 07 18:22:31 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:31 jeeves kernel: amdgpu: Move buffer fallback to memcpy unavailable
Sep 07 18:22:31 jeeves kernel: [drm] evicting device resources failed
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:22:06 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:22:22 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:22:31 jeeves kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset end with ret = -22
Sep 07 18:22:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -22
Sep 07 18:22:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=1620, emitted seq=1622
Sep 07 18:22:43 jeeves kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:04:00.0: amdgpu: Guilty job already signaled, skipping HW reset
Sep 07 18:22:43 jeeves kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset(1) succeeded!
Sep 07 18:22:43 jeeves kernel: kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics 730<0
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:22:43 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:22:56 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:56 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:56 jeeves kernel: amdgpu: Move buffer fallback to memcpy unavailable
Sep 07 18:22:56 jeeves kernel: [drm] evicting device resources failed
Sep 07 18:22:56 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:22:56 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:08 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:08 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:21 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:21 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:21 jeeves kernel: ------------[ cut here ]------------
Sep 07 18:23:21 jeeves kernel: WARNING: CPU: 1 PID: 139863 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2521 dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:21 jeeves kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc tun overlay rfkill qrtr binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr ext4 intel_rapl_common crc16 mbcache jbd2 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ghash_clmulni_intel cryptd sha512_ssse3 mei_hdcp mei_wdt sha512_generic snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio rapl snd_hda_codec_hdmi snd_hda_intel intel_cstate at24 intel_uncore iTCO_wdt snd_intel_dspcfg intel_pmc_bxt snd_intel_sdw_acpi iTCO_vendor_support pcspkr watchdog snd_hda_codec snd_hda_core snd_hwdep snd_pcm mei_me snd_timer snd mei soundcore evdev sg msr parport_pc ppdev efi_pstore lp fuse parport loop dm_mod configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas
Sep 07 18:23:23 jeeves kernel:  usb_storage hid_generic sd_mod t10_pi usbhid hid crc64_rocksoft crc64 crc_t10dif crct10dif_generic amdgpu i915 ahci libahci gpu_sched drm_buddy i2c_algo_bit drm_display_helper libata cec xhci_pci rc_core drm_ttm_helper ttm drm_kms_helper xhci_hcd crct10dif_pclmul crct10dif_common scsi_mod ehci_pci r8169 crc32_pclmul crc32c_intel ehci_hcd scsi_common i2c_i801 i2c_smbus realtek mdio_devres lpc_ich libphy drm usbcore usb_common fan video wmi button
Sep 07 18:23:23 jeeves kernel: CPU: 1 PID: 139863 Comm: kworker/1:0 Tainted: G          I        6.1.0-11-amd64 #1  Debian 6.1.38-4
Sep 07 18:23:23 jeeves kernel: Hardware name: BIOSTAR Group TB85/TB85, BIOS 4.6.5 08/22/2017
Sep 07 18:23:23 jeeves kernel: Workqueue: pm pm_runtime_work
Sep 07 18:23:23 jeeves kernel: RIP: 0010:dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel: Code: 4c 89 e7 e8 a0 be 1e 00 48 89 ef e8 d8 b4 00 00 4c 89 f7 e8 70 c0 ff ff e9 f0 fe ff ff 4c 89 e6 4c 89 ef e8 20 e3 1e 00 eb d6 <0f> 0b e9 9b fe ff ff e8 32 1b c3 f9 66 90 41 57 49 89 fa 49 89 cf
Sep 07 18:23:23 jeeves kernel: RSP: 0018:ffffbe9b42397c98 EFLAGS: 00010282
Sep 07 18:23:23 jeeves kernel: RAX: 0000000000000000 RBX: ffff9be613177450 RCX: 0000000000000000
Sep 07 18:23:23 jeeves kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9be613160000
Sep 07 18:23:23 jeeves kernel: RBP: ffff9be613160000 R08: 0000000000000001 R09: ffff9be6222bec74
Sep 07 18:23:23 jeeves kernel: R10: 0000000000000003 R11: 0000000000000005 R12: ffff9be613160000
Sep 07 18:23:23 jeeves kernel: R13: 0000000000000000 R14: ffff9be6131754f8 R15: ffff9be600a3f248
Sep 07 18:23:23 jeeves kernel: FS:  0000000000000000(0000) GS:ffff9be786b00000(0000) knlGS:0000000000000000
Sep 07 18:23:23 jeeves kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 07 18:23:23 jeeves kernel: CR2: 0000558ccf368010 CR3: 0000000103144003 CR4: 00000000000706e0
Sep 07 18:23:23 jeeves kernel: Call Trace:
Sep 07 18:23:23 jeeves kernel:  <TASK>
Sep 07 18:23:23 jeeves kernel:  ? __warn+0x7d/0xc0
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? report_bug+0xe6/0x170
Sep 07 18:23:23 jeeves kernel:  ? handle_bug+0x41/0x70
Sep 07 18:23:23 jeeves kernel:  ? exc_invalid_op+0x13/0x60
Sep 07 18:23:23 jeeves kernel:  ? asm_exc_invalid_op+0x16/0x20
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x32/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? vi_common_set_clockgating_state+0x237/0x310 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x75/0xe0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_device_suspend+0x78/0x150 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_pmops_runtime_suspend+0xba/0x190 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  pci_pm_runtime_suspend+0x66/0x1b0
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  __rpm_callback+0x44/0x170
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  rpm_callback+0x5d/0x70
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  rpm_suspend+0x11a/0x720
Sep 07 18:23:23 jeeves kernel:  pm_runtime_work+0x94/0xa0
Sep 07 18:23:23 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:23:23 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:23:23 jeeves kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Sep 07 18:23:23 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:23:23 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:23:23 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:23:23 jeeves kernel:  </TASK>
Sep 07 18:23:23 jeeves kernel: ---[ end trace 0000000000000000 ]---
Sep 07 18:23:23 jeeves kernel: ------------[ cut here ]------------
Sep 07 18:23:23 jeeves kernel: WARNING: CPU: 1 PID: 140 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2521 dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc tun overlay rfkill qrtr binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr ext4 intel_rapl_common crc16 mbcache jbd2 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ghash_clmulni_intel cryptd sha512_ssse3 mei_hdcp mei_wdt sha512_generic snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio rapl snd_hda_codec_hdmi snd_hda_intel intel_cstate at24 intel_uncore iTCO_wdt snd_intel_dspcfg intel_pmc_bxt snd_intel_sdw_acpi iTCO_vendor_support pcspkr watchdog snd_hda_codec snd_hda_core snd_hwdep snd_pcm mei_me snd_timer snd mei soundcore evdev sg msr parport_pc ppdev efi_pstore lp fuse parport loop dm_mod configfs efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c crc32c_generic uas
Sep 07 18:23:23 jeeves kernel:  usb_storage hid_generic sd_mod t10_pi usbhid hid crc64_rocksoft crc64 crc_t10dif crct10dif_generic amdgpu i915 ahci libahci gpu_sched drm_buddy i2c_algo_bit drm_display_helper libata cec xhci_pci rc_core drm_ttm_helper ttm drm_kms_helper xhci_hcd crct10dif_pclmul crct10dif_common scsi_mod ehci_pci r8169 crc32_pclmul crc32c_intel ehci_hcd scsi_common i2c_i801 i2c_smbus realtek mdio_devres lpc_ich libphy drm usbcore usb_common fan video wmi button
Sep 07 18:23:23 jeeves kernel: CPU: 1 PID: 140 Comm: kworker/1:3 Tainted: G        W I        6.1.0-11-amd64 #1  Debian 6.1.38-4
Sep 07 18:23:23 jeeves kernel: Hardware name: BIOSTAR Group TB85/TB85, BIOS 4.6.5 08/22/2017
Sep 07 18:23:23 jeeves kernel: Workqueue: pm pm_runtime_work
Sep 07 18:23:23 jeeves kernel: RIP: 0010:dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel: Code: 4c 89 e7 e8 a0 be 1e 00 48 89 ef e8 d8 b4 00 00 4c 89 f7 e8 70 c0 ff ff e9 f0 fe ff ff 4c 89 e6 4c 89 ef e8 20 e3 1e 00 eb d6 <0f> 0b e9 9b fe ff ff e8 32 1b c3 f9 66 90 41 57 49 89 fa 49 89 cf
Sep 07 18:23:23 jeeves kernel: RSP: 0018:ffffbe9b40437c98 EFLAGS: 00010282
Sep 07 18:23:23 jeeves kernel: RAX: 0000000000000000 RBX: ffff9be612d57450 RCX: 0000000000000000
Sep 07 18:23:23 jeeves kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9be612d40000
Sep 07 18:23:23 jeeves kernel: RBP: ffff9be612d40000 R08: 0000000000000001 R09: ffff9be60ea8baf4
Sep 07 18:23:23 jeeves kernel: R10: 0000000000000003 R11: 000000000000000f R12: ffff9be612d40000
Sep 07 18:23:23 jeeves kernel: R13: 0000000000000000 R14: ffff9be612d554f8 R15: ffff9be600a39248
Sep 07 18:23:23 jeeves kernel: FS:  0000000000000000(0000) GS:ffff9be786b00000(0000) knlGS:0000000000000000
Sep 07 18:23:23 jeeves kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 07 18:23:23 jeeves kernel: CR2: 00007ffde7b07078 CR3: 0000000102b52002 CR4: 00000000000706e0
Sep 07 18:23:23 jeeves kernel: Call Trace:
Sep 07 18:23:23 jeeves kernel:  <TASK>
Sep 07 18:23:23 jeeves kernel:  ? __warn+0x7d/0xc0
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? report_bug+0xe6/0x170
Sep 07 18:23:23 jeeves kernel:  ? handle_bug+0x41/0x70
Sep 07 18:23:23 jeeves kernel:  ? exc_invalid_op+0x13/0x60
Sep 07 18:23:23 jeeves kernel:  ? asm_exc_invalid_op+0x16/0x20
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x1a2/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? dm_suspend+0x32/0x1b0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  ? vi_common_set_clockgating_state+0x237/0x310 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x75/0xe0 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_device_suspend+0x78/0x150 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  amdgpu_pmops_runtime_suspend+0xba/0x190 [amdgpu]
Sep 07 18:23:23 jeeves kernel:  pci_pm_runtime_suspend+0x66/0x1b0
Sep 07 18:23:23 jeeves kernel:  ? update_load_avg+0x7e/0x780
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  __rpm_callback+0x44/0x170
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  rpm_callback+0x5d/0x70
Sep 07 18:23:23 jeeves kernel:  ? pci_dev_put+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  rpm_suspend+0x11a/0x720
Sep 07 18:23:23 jeeves kernel:  ? _raw_spin_unlock+0x15/0x30
Sep 07 18:23:23 jeeves kernel:  ? finish_task_switch.isra.0+0x9b/0x300
Sep 07 18:23:23 jeeves kernel:  ? __switch_to+0x106/0x410
Sep 07 18:23:23 jeeves kernel:  pm_runtime_work+0x94/0xa0
Sep 07 18:23:23 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:23:23 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:23:23 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:23:23 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:23:23 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:23:23 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:23:23 jeeves kernel:  </TASK>
Sep 07 18:23:23 jeeves kernel: ---[ end trace 0000000000000000 ]---
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:51 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:23:43 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:24:08 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:08 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:08 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:24:08 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:24:09 jeeves dbus-daemon[3029]: [session uid=1001 pid=3029] Activating service name='org.xfce.Xfconf' requested by ':1.16' (uid=1001 pid=5900 comm="xfsettingsd")
Sep 07 18:24:09 jeeves dbus-daemon[3029]: [session uid=1001 pid=3029] Successfully activated service 'org.xfce.Xfconf'
Sep 07 18:24:12 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:23 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:23 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: The canary thread is apparently starving. Taking action.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Demoting known real-time threads.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Successfully demoted thread 77389 of process 77039.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Successfully demoted thread 75491 of process 75461.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Successfully demoted thread 6109 of process 5989.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Successfully demoted thread 5989 of process 5989.
Sep 07 18:24:23 jeeves rtkit-daemon[2698]: Demoted 4 threads.
Sep 07 18:24:31 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:31 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:32 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:24:32 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:24:32 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:24:32 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:24:32 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:24:33 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:24:33 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:24:35 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:24:41 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:24:58 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:25:24 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:25:24 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:25:24 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:25:24 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:25:24 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on comp_1.0.0 (-110).
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on uvd (-110).
Sep 07 18:25:24 jeeves CRON[140499]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 07 18:25:24 jeeves CRON[140604]: (root) CMD (/bin/ping -qi 10 -c 3 -I zt4mrrjgxa 10.11.1.1 >/dev/null || /usr/sbin/service zerotier-one restart)
Sep 07 18:25:24 jeeves kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).
Sep 07 18:25:24 jeeves kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:25:24 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:25:33 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:33 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:33 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:25:42 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:42 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:25:42 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:25:42 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:25:42 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:25:42 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:25:43 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:25:43 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:25:43 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:25:48 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:00 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:00 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:08 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:08 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:08 jeeves CRON[140499]: pam_unix(cron:session): session closed for user root
Sep 07 18:26:09 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:26:09 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:26:09 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:26:17 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:26:17 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:26:18 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:26:50 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:06 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:06 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:06 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:27:06 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:06 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:06 jeeves kernel: INFO: task kworker/u4:7:133662 blocked for more than 126 seconds.
Sep 07 18:27:06 jeeves kernel:       Tainted: G        W I        6.1.0-11-amd64 #1 Debian 6.1.38-4
Sep 07 18:27:06 jeeves kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 07 18:27:06 jeeves kernel: task:kworker/u4:7    state:D stack:0     pid:133662 ppid:2      flags:0x00004000
Sep 07 18:27:06 jeeves kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Sep 07 18:27:06 jeeves kernel: Call Trace:
Sep 07 18:27:06 jeeves kernel:  <TASK>
Sep 07 18:27:06 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:27:06 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:27:06 jeeves kernel:  schedule_preempt_disabled+0x14/0x30
Sep 07 18:27:06 jeeves kernel:  __mutex_lock.constprop.0+0x3b4/0x700
Sep 07 18:27:06 jeeves kernel:  ? __schedule+0x359/0xa20
Sep 07 18:27:06 jeeves kernel:  dm_suspend+0xba/0x1b0 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  ? __cond_resched+0x1c/0x30
Sep 07 18:27:06 jeeves kernel:  ? preempt_schedule_common+0x2d/0x70
Sep 07 18:27:06 jeeves kernel:  ? __cond_resched+0x1c/0x30
Sep 07 18:27:06 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x75/0xe0 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  amdgpu_device_ip_suspend+0x1b/0x70 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  amdgpu_device_pre_asic_reset+0xcf/0x290 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  amdgpu_device_gpu_recover.cold+0x607/0xad4 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  amdgpu_job_timedout+0x1d8/0x220 [amdgpu]
Sep 07 18:27:06 jeeves kernel:  ? psi_group_change+0x145/0x360
Sep 07 18:27:06 jeeves kernel:  ? __switch_to+0x228/0x410
Sep 07 18:27:06 jeeves kernel:  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
Sep 07 18:27:06 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:27:06 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:27:06 jeeves kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Sep 07 18:27:06 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:27:06 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:27:06 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:27:06 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:27:06 jeeves kernel:  </TASK>
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
Sep 07 18:27:06 jeeves kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:02:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:27:06 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:06 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:06 jeeves kernel: amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:09 jeeves kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Sep 07 18:27:09 jeeves kernel: [drm] UVD and UVD ENC initialized successfully.
Sep 07 18:27:09 jeeves kernel: [drm] VCE initialized successfully.
Sep 07 18:27:09 jeeves kernel: amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
Sep 07 18:27:12 jeeves kernel: amdgpu 0000:01:00.0: amdgpu: 
                               last message was failed ret is 0
Sep 07 18:28:44 jeeves kernel: INFO: task radeontop:46792 blocked for more than 121 seconds.
Sep 07 18:28:44 jeeves kernel:       Tainted: G        W I        6.1.0-11-amd64 #1 Debian 6.1.38-4
Sep 07 18:28:44 jeeves kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 07 18:28:44 jeeves kernel: task:radeontop       state:D stack:0     pid:46792 ppid:46288  flags:0x00004002
Sep 07 18:28:44 jeeves kernel: Call Trace:
Sep 07 18:28:44 jeeves kernel:  <TASK>
Sep 07 18:28:44 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:28:44 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:28:44 jeeves kernel:  schedule_preempt_disabled+0x14/0x30
Sep 07 18:28:44 jeeves kernel:  __mutex_lock.constprop.0+0x3b4/0x700
Sep 07 18:28:44 jeeves kernel:  drm_release+0x42/0xd0 [drm]
Sep 07 18:28:44 jeeves kernel:  __fput+0x91/0x250
Sep 07 18:28:44 jeeves kernel:  task_work_run+0x59/0x90
Sep 07 18:28:44 jeeves kernel:  do_exit+0x357/0xb10
Sep 07 18:28:44 jeeves kernel:  ? finish_task_switch.isra.0+0x25e/0x300
Sep 07 18:28:44 jeeves kernel:  ? __switch_to+0x106/0x410
Sep 07 18:28:44 jeeves kernel:  do_group_exit+0x2d/0x80
Sep 07 18:28:44 jeeves kernel:  get_signal+0x96a/0x970
Sep 07 18:28:44 jeeves kernel:  ? _raw_spin_unlock_irqrestore+0x23/0x40
Sep 07 18:28:44 jeeves kernel:  ? hrtimer_try_to_cancel+0x78/0x110
Sep 07 18:28:44 jeeves kernel:  arch_do_signal_or_restart+0x3e/0x840
Sep 07 18:28:44 jeeves kernel:  ? hrtimer_nanosleep+0xc7/0x1b0
Sep 07 18:28:44 jeeves kernel:  exit_to_user_mode_prepare+0x18c/0x1d0
Sep 07 18:28:44 jeeves kernel:  syscall_exit_to_user_mode+0x17/0x40
Sep 07 18:28:44 jeeves kernel:  do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  ? do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  ? do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  entry_SYSCALL_64_after_hwframe+0x69/0xd3
Sep 07 18:28:44 jeeves kernel: RIP: 0033:0x7f351cab9385
Sep 07 18:28:44 jeeves kernel: RSP: 002b:00007f351c5fed50 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
Sep 07 18:28:44 jeeves kernel: RAX: fffffffffffffdfc RBX: 0000000000000061 RCX: 00007f351cab9385
Sep 07 18:28:44 jeeves kernel: RDX: 00007f351c5fed90 RSI: 0000000000000000 RDI: 0000000000000000
Sep 07 18:28:44 jeeves kernel: RBP: 00007f351c5fedf0 R08: 0000000000000000 R09: 00007f351c5fedec
Sep 07 18:28:44 jeeves kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007f35140030f0
Sep 07 18:28:44 jeeves kernel: R13: 0000000000000078 R14: 00007f35140029c0 R15: 00007f3514000b70
Sep 07 18:28:44 jeeves kernel:  </TASK>
Sep 07 18:28:44 jeeves kernel: INFO: task kworker/u4:7:133662 blocked for more than 248 seconds.
Sep 07 18:28:44 jeeves kernel:       Tainted: G        W I        6.1.0-11-amd64 #1 Debian 6.1.38-4
Sep 07 18:28:44 jeeves kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 07 18:28:44 jeeves kernel: task:kworker/u4:7    state:D stack:0     pid:133662 ppid:2      flags:0x00004000
Sep 07 18:28:44 jeeves kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Sep 07 18:28:44 jeeves kernel: Call Trace:
Sep 07 18:28:44 jeeves kernel:  <TASK>
Sep 07 18:28:44 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:28:44 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:28:44 jeeves kernel:  schedule_preempt_disabled+0x14/0x30
Sep 07 18:28:44 jeeves kernel:  __mutex_lock.constprop.0+0x3b4/0x700
Sep 07 18:28:44 jeeves kernel:  ? __schedule+0x359/0xa20
Sep 07 18:28:44 jeeves kernel:  dm_suspend+0xba/0x1b0 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  ? __cond_resched+0x1c/0x30
Sep 07 18:28:44 jeeves kernel:  ? preempt_schedule_common+0x2d/0x70
Sep 07 18:28:44 jeeves kernel:  ? __cond_resched+0x1c/0x30
Sep 07 18:28:44 jeeves kernel:  amdgpu_device_ip_suspend_phase1+0x75/0xe0 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_device_ip_suspend+0x1b/0x70 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_device_pre_asic_reset+0xcf/0x290 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_device_gpu_recover.cold+0x607/0xad4 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_job_timedout+0x1d8/0x220 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  ? psi_group_change+0x145/0x360
Sep 07 18:28:44 jeeves kernel:  ? __switch_to+0x228/0x410
Sep 07 18:28:44 jeeves kernel:  drm_sched_job_timedout+0x76/0x110 [gpu_sched]
Sep 07 18:28:44 jeeves kernel:  process_one_work+0x1c7/0x380
Sep 07 18:28:44 jeeves kernel:  worker_thread+0x4d/0x380
Sep 07 18:28:44 jeeves kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Sep 07 18:28:44 jeeves kernel:  ? rescuer_thread+0x3a0/0x3a0
Sep 07 18:28:44 jeeves kernel:  kthread+0xe9/0x110
Sep 07 18:28:44 jeeves kernel:  ? kthread_complete_and_exit+0x20/0x20
Sep 07 18:28:44 jeeves kernel:  ret_from_fork+0x22/0x30
Sep 07 18:28:44 jeeves kernel:  </TASK>
Sep 07 18:28:44 jeeves kernel: INFO: task ai00_server:135401 blocked for more than 121 seconds.
Sep 07 18:28:44 jeeves kernel:       Tainted: G        W I        6.1.0-11-amd64 #1 Debian 6.1.38-4
Sep 07 18:28:44 jeeves kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 07 18:28:44 jeeves kernel: task:ai00_server     state:D stack:0     pid:135401 ppid:18857  flags:0x00004006
Sep 07 18:28:44 jeeves kernel: Call Trace:
Sep 07 18:28:44 jeeves kernel:  <TASK>
Sep 07 18:28:44 jeeves kernel:  __schedule+0x351/0xa20
Sep 07 18:28:44 jeeves kernel:  schedule+0x5d/0xe0
Sep 07 18:28:44 jeeves kernel:  schedule_timeout+0x118/0x150
Sep 07 18:28:44 jeeves kernel:  dma_fence_default_wait+0x1a5/0x260
Sep 07 18:28:44 jeeves kernel:  ? __bpf_trace_dma_fence+0x10/0x10
Sep 07 18:28:44 jeeves kernel:  dma_fence_wait_timeout+0x108/0x130
Sep 07 18:28:44 jeeves kernel:  amdgpu_vm_fini+0xf7/0x510 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  amdgpu_driver_postclose_kms+0x1e5/0x2d0 [amdgpu]
Sep 07 18:28:44 jeeves kernel:  drm_file_free.part.0+0x207/0x250 [drm]
Sep 07 18:28:44 jeeves kernel:  drm_release+0x64/0xd0 [drm]
Sep 07 18:28:44 jeeves kernel:  __fput+0x91/0x250
Sep 07 18:28:44 jeeves kernel:  task_work_run+0x59/0x90
Sep 07 18:28:44 jeeves kernel:  exit_to_user_mode_prepare+0x1c4/0x1d0
Sep 07 18:28:44 jeeves kernel:  syscall_exit_to_user_mode+0x17/0x40
Sep 07 18:28:44 jeeves kernel:  do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  ? exit_to_user_mode_prepare+0x40/0x1d0
Sep 07 18:28:44 jeeves kernel:  ? syscall_exit_to_user_mode+0x17/0x40
Sep 07 18:28:44 jeeves kernel:  ? do_syscall_64+0x67/0xc0
Sep 07 18:28:44 jeeves kernel:  entry_SYSCALL_64_after_hwframe+0x69/0xd3
Sep 07 18:28:44 jeeves kernel: RIP: 0033:0x7f78fb9e27ea
Sep 07 18:28:44 jeeves kernel: RSP: 002b:00007f78b9bb76c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
Sep 07 18:28:44 jeeves kernel: RAX: 0000000000000000 RBX: 000055ab3ee3e150 RCX: 00007f78fb9e27ea
Sep 07 18:28:44 jeeves kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000d
Sep 07 18:28:44 jeeves kernel: RBP: 000055ab3eed9e90 R08: 0000000000000007 R09: 000055ab3ee40580
Sep 07 18:28:44 jeeves kernel: R10: 7d63c7425ede93d5 R11: 0000000000000293 R12: 000055ab3ed59d88
Sep 07 18:28:44 jeeves kernel: R13: 000055ab3ee34448 R14: 000055ab3ee34648 R15: 000055ab3ee34310
Sep 07 18:28:44 jeeves kernel:  </TASK>
Sep 07 18:30:01 jeeves CRON[146976]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 07 18:30:01 jeeves CRON[146977]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 07 18:30:01 jeeves CRON[146978]: (root) CMD (/bin/ping -qi 10 -c 3 -I zt4mrrjgxa 10.11.1.1 >/dev/null || /usr/sbin/service zerotier-one restart)
Sep 07 18:30:01 jeeves CRON[146979]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)
Sep 07 18:30:01 jeeves CRON[146976]: pam_unix(cron:session): session closed for user root
Sep 07 18:30:21 jeeves CRON[146977]: pam_unix(cron:session): session closed for user root
cryscan commented 1 year ago

Sounds great. Will add manual selection later.

cryscan commented 1 year ago

One can configure that manually or automatically since v0.2.2.