Irqbalance / irqbalance

The irqbalance source tree - The new official site for irqbalance
http://irqbalance.github.io/irqbalance/
GNU General Public License v2.0
586 stars 139 forks source link

Coredumps (Kernel 6.3.12) #267

Closed Uglymotha closed 1 year ago

Uglymotha commented 1 year ago

On my system ruinning 6.3.12 kernel, irqbalance coredumps with a SIGSEGV

[ 832.859285] traps: irqbalance[467787] general protection fault ip:55b99ffd5e9d sp:7ffe76006820 error:0 in irqbalance[55b99ffcb000+d000]

poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, 9940) = 0 (Timeout) openat(AT_FDCWD, "/proc/interrupts", O_RDONLY) = 7 newfstatat(7, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0 read(7, " CPU0 CPU1 "..., 1024) = 1024 read(7, "0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 PCI"..., 1024) = 1024 read(7, " 0 0 0"..., 1024) = 1024 read(7, " 0 IR-PCI-MSI-0000:00:03"..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, "-PCI-MSI-0000:20:01.1 0-edge "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, ":20:07.1 0-edge PCIe PME"..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, "-edge PCIe PME\n 56: "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0-edge nvme2q0\n 67: "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, "2\n 73: 0 0 "..., 1024) = 1024 read(7, " 0 0 "..., 1024) = 1024 read(7, "0 0 0 "..., 1024) = 1024 read(7, " 0 0 "..., 1024) = 1024 read(7, " 0 0 0"..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, "0 0 0 "..., 1024) = 1024 read(7, " 480867 0 "..., 1024) = 1024 read(7, " 0 0 0"..., 1024) = 1024 read(7, " 0 0 226 "..., 1024) = 1024 read(7, " 0 0 231"..., 1024) = 1024 read(7, " 0 0 0 IR-PC"..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, "PCI-MSI-0000:28:00.0 0-edge "..., 1024) = 1024 read(7, " 0 727 145 "..., 1024) = 1024 read(7, ":00.0 3-edge eno1-TxRx-2"..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, ":00.0 3-edge eno2-TxRx-2"..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, "0.0 3-edge xhci_hcd\n 159"..., 1024) = 1024 read(7, "0 0 0 "..., 1024) = 1024 read(7, "-edge xhci_hcd\n 165: "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " xhci_hcd\n 170: 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, "hcd\n 176: 0 0 "..., 1024) = 1024 read(7, " 0 0 0"..., 1024) = 1024 read(7, ": 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 1024 read(7, " 0 0 0"..., 1024) = 1024 read(7, " 0 0 0 "..., 1024) = 976 read(7, " NMI: 0 0 "..., 1024) = 1024 close(7) = 0 openat(AT_FDCWD, "/proc/stat", O_RDONLY) = 7 newfstatat(7, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0 read(7, "cpu 22176064 17 6522492 2243695"..., 1024) = 1024 read(7, "2 0\ncpu18 578787 0 169788 716682"..., 1024) = 1024 read(7, "984 118751 0 768842 485529 48751"..., 1024) = 1024 read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024 read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024 read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024 read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024 read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 810 close(7)
(https://github.com/Irqbalance/irqbalance/files/11982114/core.irqbalance.0.a64b7450a3574ec4a5c28d201f6cac33.27879.zip) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} --- +++ killed by SIGSEGV (core dumped) +++

[core.irqbalance.0.a64b7450a3574ec4a5c28d201f6cac33.27879.zip]

Uglymotha commented 1 year ago

[root@rdsan01 uglymotha]# ./irqbalance -df Prevent irq assignment to these isolated CPUs: 00000000 Prevent irq assignment to these adaptive-ticks CPUs: 00000000 Banned CPUs: 00000000 Package 0: numa_node 1 cpu mask is 00f000f0 (load 0) Cache domain 0: numa_node is 1 cpu mask is 00200020 (load 0) CPU number 21 numa_node is 1 (load 0) CPU number 5 numa_node is 1 (load 0) Cache domain 3: numa_node is 1 cpu mask is 00800080 (load 0) CPU number 7 numa_node is 1 (load 0) CPU number 23 numa_node is 1 (load 0) Cache domain 12: numa_node is 1 cpu mask is 00400040 (load 0) CPU number 22 numa_node is 1 (load 0) CPU number 6 numa_node is 1 (load 0) Cache domain 13: numa_node is 1 cpu mask is 00100010 (load 0) CPU number 20 numa_node is 1 (load 0) CPU number 4 numa_node is 1 (load 0) Package 0: numa_node 2 cpu mask is 0f000f00 (load 0) Cache domain 1: numa_node is 2 cpu mask is 08000800 (load 0) CPU number 11 numa_node is 2 (load 0) CPU number 27 numa_node is 2 (load 0) Cache domain 2: numa_node is 2 cpu mask is 02000200 (load 0) CPU number 9 numa_node is 2 (load 0) CPU number 25 numa_node is 2 (load 0) Cache domain 6: numa_node is 2 cpu mask is 04000400 (load 0) CPU number 26 numa_node is 2 (load 0) CPU number 10 numa_node is 2 (load 0) Cache domain 9: numa_node is 2 cpu mask is 01000100 (load 0) CPU number 24 numa_node is 2 (load 0) CPU number 8 numa_node is 2 (load 0) Package 0: numa_node 3 cpu mask is f000f000 (load 0) Cache domain 4: numa_node is 3 cpu mask is 10001000 (load 0) CPU number 28 numa_node is 3 (load 0) CPU number 12 numa_node is 3 (load 0) Cache domain 10: numa_node is 3 cpu mask is 40004000 (load 0) CPU number 14 numa_node is 3 (load 0) CPU number 30 numa_node is 3 (load 0) Cache domain 14: numa_node is 3 cpu mask is 20002000 (load 0) CPU number 29 numa_node is 3 (load 0) CPU number 13 numa_node is 3 (load 0) Cache domain 15: numa_node is 3 cpu mask is 80008000 (load 0) CPU number 15 numa_node is 3 (load 0) CPU number 31 numa_node is 3 (load 0) Package 0: numa_node 0 cpu mask is 000f000f (load 0) Cache domain 5: numa_node is 0 cpu mask is 00040004 (load 0) CPU number 18 numa_node is 0 (load 0) CPU number 2 numa_node is 0 (load 0) Cache domain 7: numa_node is 0 cpu mask is 00010001 (load 0) CPU number 16 numa_node is 0 (load 0) CPU number 0 numa_node is 0 (load 0) Cache domain 8: numa_node is 0 cpu mask is 00080008 (load 0) CPU number 3 numa_node is 0 (load 0) CPU number 19 numa_node is 0 (load 0) Cache domain 11: numa_node is 0 cpu mask is 00020002 (load 0) CPU number 1 numa_node is 0 (load 0) CPU number 17 numa_node is 0 (load 0) Adding IRQ 162 to database Adding IRQ 160 to database Adding IRQ 159 to database Adding IRQ 157 to database Adding IRQ 155 to database Adding IRQ 161 to database Adding IRQ 158 to database Adding IRQ 156 to database Adding IRQ 34 to database Adding IRQ 42 to database Adding IRQ 29 to database Adding IRQ 47 to database Adding IRQ 31 to database Adding IRQ 214 to database Adding IRQ 57 to database Adding IRQ 56 to database Adding IRQ 83 to database Adding IRQ 73 to database Adding IRQ 81 to database Adding IRQ 71 to database Adding IRQ 78 to database Adding IRQ 76 to database Adding IRQ 84 to database Adding IRQ 74 to database Adding IRQ 82 to database Adding IRQ 72 to database Adding IRQ 80 to database Adding IRQ 70 to database Adding IRQ 79 to database Adding IRQ 77 to database Adding IRQ 85 to database Adding IRQ 75 to database Adding IRQ 40 to database Adding IRQ 98 to database Adding IRQ 102 to database Adding IRQ 68 to database Adding IRQ 96 to database Adding IRQ 100 to database Adding IRQ 99 to database Adding IRQ 97 to database Adding IRQ 101 to database Adding IRQ 95 to database Adding IRQ 143 to database Adding IRQ 61 to database Adding IRQ 207 to database Adding IRQ 206 to database Adding IRQ 204 to database Adding IRQ 45 to database Adding IRQ 36 to database Adding IRQ 116 to database Adding IRQ 114 to database Adding IRQ 112 to database Adding IRQ 117 to database Adding IRQ 115 to database Adding IRQ 113 to database Adding IRQ 111 to database Adding IRQ 118 to database Adding IRQ 65 to database Adding IRQ 26 to database Adding IRQ 139 to database Adding IRQ 63 to database Adding IRQ 91 to database Adding IRQ 88 to database Adding IRQ 86 to database Adding IRQ 92 to database Adding IRQ 90 to database Adding IRQ 89 to database Adding IRQ 87 to database Adding IRQ 164 to database Adding IRQ 170 to database Adding IRQ 169 to database Adding IRQ 167 to database Adding IRQ 165 to database Adding IRQ 171 to database Adding IRQ 168 to database Adding IRQ 166 to database Adding IRQ 33 to database Adding IRQ 54 to database Adding IRQ 43 to database Adding IRQ 28 to database Adding IRQ 39 to database Adding IRQ 203 to database Adding IRQ 202 to database Adding IRQ 108 to database Adding IRQ 106 to database Adding IRQ 104 to database Adding IRQ 110 to database Adding IRQ 109 to database Adding IRQ 107 to database Adding IRQ 105 to database Adding IRQ 103 to database Adding IRQ 67 to database Adding IRQ 182 to database Adding IRQ 187 to database Adding IRQ 185 to database Adding IRQ 183 to database Adding IRQ 181 to database Adding IRQ 188 to database Adding IRQ 186 to database Adding IRQ 184 to database Adding IRQ 41 to database Adding IRQ 126 to database Adding IRQ 134 to database Adding IRQ 124 to database Adding IRQ 132 to database Adding IRQ 122 to database Adding IRQ 130 to database Adding IRQ 120 to database Adding IRQ 129 to database Adding IRQ 127 to database Adding IRQ 135 to database Adding IRQ 125 to database Adding IRQ 133 to database Adding IRQ 123 to database Adding IRQ 131 to database Adding IRQ 121 to database Adding IRQ 128 to database Adding IRQ 51 to database Adding IRQ 30 to database Adding IRQ 52 to database Adding IRQ 154 to database Adding IRQ 152 to database Adding IRQ 150 to database Adding IRQ 153 to database Adding IRQ 151 to database Adding IRQ 174 to database Adding IRQ 180 to database Adding IRQ 179 to database Adding IRQ 177 to database Adding IRQ 175 to database Adding IRQ 173 to database Adding IRQ 178 to database Adding IRQ 176 to database Adding IRQ 209 to database Adding IRQ 211 to database Adding IRQ 212 to database Adding IRQ 141 to database Adding IRQ 201 to database Adding IRQ 37 to database Adding IRQ 144 to database Adding IRQ 147 to database Adding IRQ 145 to database Adding IRQ 148 to database Adding IRQ 146 to database Adding IRQ 59 to database Adding IRQ 49 to database Adding IRQ 35 to database Adding IRQ 44 to database Adding IRQ 137 to database Adding IRQ 27 to database Adding IRQ 199 to database Adding IRQ 198 to database Adding IRQ 192 to database Adding IRQ 190 to database Adding IRQ 197 to database Adding IRQ 195 to database Adding IRQ 193 to database Adding IRQ 191 to database Adding IRQ 196 to database Adding IRQ 194 to database Adding IRQ 32 to database Adding IRQ 94 to database Adding IRQ 0 to database Adding IRQ 8 to database Adding IRQ 9 to database NUMA NODE NUMBER: -1 LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff

NUMA NODE NUMBER: 2 LOCAL CPU MASK: 0f000f00

NUMA NODE NUMBER: 0 LOCAL CPU MASK: 000f000f

NUMA NODE NUMBER: 3 LOCAL CPU MASK: f000f000

NUMA NODE NUMBER: 1 LOCAL CPU MASK: 00f000f0

IRQ 214 is removed from interrupts_db. IRQ 57 is removed from interrupts_db. IRQ 83 is removed from interrupts_db. IRQ 81 is removed from interrupts_db. IRQ 78 is removed from interrupts_db. IRQ 84 is removed from interrupts_db. IRQ 82 is removed from interrupts_db. IRQ 80 is removed from interrupts_db. IRQ 79 is removed from interrupts_db. IRQ 85 is removed from interrupts_db. IRQ 207 is removed from interrupts_db. IRQ 204 is removed from interrupts_db. IRQ 203 is removed from interrupts_db. IRQ 134 is removed from interrupts_db. IRQ 132 is removed from interrupts_db. IRQ 130 is removed from interrupts_db. IRQ 129 is removed from interrupts_db. IRQ 135 is removed from interrupts_db. IRQ 133 is removed from interrupts_db. IRQ 131 is removed from interrupts_db. IRQ 128 is removed from interrupts_db. IRQ 209 is removed from interrupts_db. IRQ 212 is removed from interrupts_db. IRQ 201 is removed from interrupts_db. IRQ 199 is removed from interrupts_db. Daemon couldn't be bound to the file-based socket.


cannot change irq 98's affinity, add it to banned listIRQ 98 was BANNED. IRQ 98 was removed from db. cannot change irq 102's affinity, add it to banned listIRQ 102 was BANNED. IRQ 102 was removed from db. cannot change irq 96's affinity, add it to banned listIRQ 96 was BANNED. IRQ 96 was removed from db. cannot change irq 100's affinity, add it to banned listIRQ 100 was BANNED. IRQ 100 was removed from db. cannot change irq 99's affinity, add it to banned listIRQ 99 was BANNED. IRQ 99 was removed from db. cannot change irq 97's affinity, add it to banned listIRQ 97 was BANNED. IRQ 97 was removed from db. cannot change irq 101's affinity, add it to banned listIRQ 101 was BANNED. IRQ 101 was removed from db. cannot change irq 95's affinity, add it to banned listIRQ 95 was BANNED. IRQ 95 was removed from db. cannot change irq 116's affinity, add it to banned listIRQ 116 was BANNED. IRQ 116 was removed from db. cannot change irq 114's affinity, add it to banned listIRQ 114 was BANNED. IRQ 114 was removed from db. cannot change irq 112's affinity, add it to banned listIRQ 112 was BANNED. IRQ 112 was removed from db. cannot change irq 117's affinity, add it to banned listIRQ 117 was BANNED. IRQ 117 was removed from db. cannot change irq 115's affinity, add it to banned listIRQ 115 was BANNED. IRQ 115 was removed from db. cannot change irq 113's affinity, add it to banned listIRQ 113 was BANNED. IRQ 113 was removed from db. cannot change irq 111's affinity, add it to banned listIRQ 111 was BANNED. IRQ 111 was removed from db. cannot change irq 118's affinity, add it to banned listIRQ 118 was BANNED. IRQ 118 was removed from db. cannot change irq 91's affinity, add it to banned listIRQ 91 was BANNED. IRQ 91 was removed from db. cannot change irq 88's affinity, add it to banned listIRQ 88 was BANNED. IRQ 88 was removed from db. cannot change irq 86's affinity, add it to banned listIRQ 86 was BANNED. IRQ 86 was removed from db. cannot change irq 92's affinity, add it to banned listIRQ 92 was BANNED. IRQ 92 was removed from db. cannot change irq 90's affinity, add it to banned listIRQ 90 was BANNED. IRQ 90 was removed from db. cannot change irq 89's affinity, add it to banned listIRQ 89 was BANNED. IRQ 89 was removed from db. cannot change irq 87's affinity, add it to banned listIRQ 87 was BANNED. IRQ 87 was removed from db. cannot change irq 108's affinity, add it to banned listIRQ 108 was BANNED. IRQ 108 was removed from db. cannot change irq 106's affinity, add it to banned listIRQ 106 was BANNED. IRQ 106 was removed from db. cannot change irq 104's affinity, add it to banned listIRQ 104 was BANNED. IRQ 104 was removed from db. cannot change irq 110's affinity, add it to banned listIRQ 110 was BANNED. IRQ 110 was removed from db. cannot change irq 109's affinity, add it to banned listIRQ 109 was BANNED. IRQ 109 was removed from db. cannot change irq 107's affinity, add it to banned listIRQ 107 was BANNED. IRQ 107 was removed from db. cannot change irq 105's affinity, add it to banned listIRQ 105 was BANNED. IRQ 105 was removed from db. cannot change irq 103's affinity, add it to banned listIRQ 103 was BANNED. IRQ 103 was removed from db. cannot change irq 59's affinity, add it to banned listIRQ 59 was BANNED. IRQ 59 was removed from db. cannot change irq 0's affinity, add it to banned listIRQ 0 was BANNED. IRQ 0 was removed from db. Package 0: numa_node 1 cpu mask is 00f000f0 (load 0) Cache domain 0: numa_node is 1 cpu mask is 00200020 (load 0) CPU number 21 numa_node is 1 (load 0) Interrupt 137 node_num is 1 (ethernet/0:1249)

Segmentation fault (core dumped)

nhorman commented 1 year ago

provide a backtrace from the core please, and note the git hash that this was built from

Uglymotha commented 1 year ago

Core was generated by `./irqbalance --foreground'. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x0000556392a43e9d in place_irq_in_node ()

[Current thread is 1 (LWP 4012791)] (gdb) bt

0 0x0000556392a43e9d in place_irq_in_node ()

1 0x0000556392a3d5a2 in for_each_irq ()

2 0x0000556392a44330 in calculate_placement ()

3 0x0000556392a40cc6 in scan ()

4 0x00007fe618735d3b in ?? ()

5 0x0000556393004fe0 in ?? ()

6 0x00005563930059b0 in ?? ()

7 0x0000000000000000 in ?? ()

It was built with git master yesterday

nhorman commented 1 year ago

Can you please rebuild from commit 184c95029ebff84d499fc8ea88a906ff9460bf15 and try again? If it fixes the problem I'll revert the offending commit

Uglymotha commented 1 year ago

That llooks to be the cause, irqbalance has now been running for 24hours without coredump.

Thanks.

nhorman commented 1 year ago

reverted the offending commit

rjarry commented 1 year ago

Hmm, the root cause is probably from commit 55c5c321c73e4. fflush() only reveals it. The write error was hidden previously. I'll have a look.