Aquantia / AQtion

Aquantia AQC multigigabit NIC linux driver (atlantic) - development preview
https://www.aquantia.com
83 stars 30 forks source link

AQC107 fails to resume from suspend #62

Open johndoe31415 opened 1 month ago

johndoe31415 commented 1 month ago

Hello there!

I'm using an AQC107 NIC:

01:00.0 Ethernet controller [0200]: Aquantia Corp. AQtion AQC107 NBase-T/IEEE 802.3an Ethernet Controller [Atlantic 10G] [1d6a:07b1] (rev 02)

on Linux x86_64 running a standard Ubuntu (2024.04 noble) stock kernel, untainted:

reliant joe [~]: uname -a
Linux reliant 6.5.0-28-generic #29-Ubuntu SMP PREEMPT_DYNAMIC Thu Mar 28 23:46:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

reliant joe [~]: cat /proc/sys/kernel/tainted                                  
0

I'm experiencing sporadic NIC failures when waking from suspend-to-RAM. Concretely, when it happens (maybe on every 3rd suspend operation, so reasonably/annoyingly often), the network driver completely locks up and no connectivity is possible. Sometimes I'm able to recover by rmmod and modprobe, but in 90% of the cases this also is not possible and I have to reboot to get the NIC working again. Note also that the system will go into shutdown but systemd then hangs somewhere, needing me to issue a hard reset.

When it occurs, I see the following in dmesg:

[33284.397291] kworker/u256:70: page allocation failure: order:5, mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[33284.397303] CPU: 17 PID: 4159607 Comm: kworker/u256:70 Not tainted 6.5.0-28-generic #29-Ubuntu
[33284.397306] Hardware name: LENOVO 30E0003QGE/1046, BIOS S07KT4AA 07/22/2022
[33284.397308] Workqueue: events_unbound async_run_entry_fn
[33284.397313] Call Trace:
[33284.397315]  <TASK>
[33284.397319]  dump_stack_lvl+0x48/0x70
[33284.397323]  dump_stack+0x10/0x20
[33284.397325]  warn_alloc+0x174/0x1f0
[33284.397329]  ? __alloc_pages_direct_compact+0xb7/0x240
[33284.397334]  __alloc_pages_slowpath.constprop.0+0x8f1/0x980
[33284.397339]  __alloc_pages+0x31f/0x350
[33284.397344]  ? aq_ring_alloc+0x27/0x90 [atlantic]
[33284.397359]  __kmalloc_large_node+0x7a/0x150
[33284.397362]  ? iommu_dma_alloc+0x16e/0x1e0
[33284.397366]  __kmalloc+0xdb/0x170
[33284.397370]  aq_ring_alloc+0x27/0x90 [atlantic]
[33284.397383]  aq_ring_rx_alloc+0x97/0xb0 [atlantic]
[33284.397396]  aq_vec_ring_alloc+0xbe/0x290 [atlantic]
[33284.397409]  ? hw_atl_b0_hw_ring_rx_fill+0x5d/0x70 [atlantic]
[33284.397424]  aq_nic_init+0x13d/0x240 [atlantic]
[33284.397439]  atl_resume_common+0x46/0xf0 [atlantic]
[33284.397452]  aq_pm_resume_restore+0xe/0x20 [atlantic]
[33284.397465]  pci_pm_resume+0x75/0x110
[33284.397468]  ? __pfx_pci_pm_resume+0x10/0x10
[33284.397471]  dpm_run_callback+0x54/0x1b0
[33284.397475]  device_resume+0xad/0x220
[33284.397478]  async_resume+0x1f/0x90
[33284.397480]  async_run_entry_fn+0x33/0x130
[33284.397483]  process_one_work+0x223/0x440
[33284.397487]  worker_thread+0x4d/0x3f0
[33284.397490]  ? __pfx_worker_thread+0x10/0x10
[33284.397492]  kthread+0xf2/0x120
[33284.397495]  ? __pfx_kthread+0x10/0x10
[33284.397498]  ret_from_fork+0x47/0x70
[33284.397501]  ? __pfx_kthread+0x10/0x10
[33284.397504]  ret_from_fork_asm+0x1b/0x30
[33284.397510]  </TASK>
[33284.397511] Mem-Info:
[33284.397513] active_anon:1394551 inactive_anon:357689 isolated_anon:0
                active_file:1793872 inactive_file:9717634 isolated_file:0
                unevictable:104 dirty:42 writeback:0
                slab_reclaimable:1842731 slab_unreclaimable:234794
                mapped:327615 shmem:140848 pagetables:23776
                sec_pagetables:0 bounce:0
                kernel_misc_reclaimable:0
                free:292280 free_pcp:0 free_cma:0
[33284.397518] Node 0 active_anon:5578204kB inactive_anon:1430756kB active_file:7175488kB inactive_file:38870536kB unevictable:416kB isolated(anon):0kB isolated(file):0kB mapped:1310460kB dirty:168kB writeback:0kB shmem:563392kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:46704kB pagetables:95104kB sec_pagetables:0kB all_unreclaimable? no
[33284.397523] Node 0 DMA free:11264kB boost:0kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[33284.397527] lowmem_reserve[]: 0 2858 64098 64098 64098
[33284.397532] Node 0 DMA32 free:262836kB boost:12876kB min:15796kB low:18628kB high:21460kB reserved_highatomic:2048KB active_anon:21740kB inactive_anon:3736kB active_file:308kB inactive_file:1991556kB unevictable:0kB writepending:0kB present:2992764kB managed:2926724kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[33284.397537] lowmem_reserve[]: 0 0 61240 61240 61240
[33284.397541] Node 0 Normal free:895020kB boost:285064kB min:349708kB low:412408kB high:475108kB reserved_highatomic:2048KB active_anon:5556464kB inactive_anon:1427020kB active_file:7175180kB inactive_file:36878980kB unevictable:416kB writepending:168kB present:63949824kB managed:62710260kB mlocked:416kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[33284.397546] lowmem_reserve[]: 0 0 0 0 0
[33284.397551] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[33284.397564] Node 0 DMA32: 4253*4kB (UME) 2490*8kB (UME) 1727*16kB (UMEH) 2056*32kB (UMEH) 1180*64kB (UMEH) 443*128kB (UMEH) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 262836kB
[33284.397581] Node 0 Normal: 130947*4kB (UMEH) 44782*8kB (UMEH) 121*16kB (UMEH) 345*32kB (UME) 4*64kB (ME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 895276kB
[33284.397596] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[33284.397597] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[33284.397598] 11652354 total pagecache pages
[33284.397599] 0 pages in swap cache
[33284.397600] Free swap  = 0kB
[33284.397601] Total swap = 0kB
[33284.397602] 16739646 pages RAM
[33284.397603] 0 pages HighMem/MovableOnly
[33284.397603] 326560 pages reserved
[33284.397604] 0 pages hwpoisoned
[33284.403493] atlantic 0000:01:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0x110 returns -12
[33284.403497] atlantic 0000:01:00.0: PM: failed to resume async: error -12

Any advice on how I can support debugging this issue is greatly appreciated. Thanks!