Irqbalance / irqbalance

The irqbalance source tree - The new official site for irqbalance
http://irqbalance.github.io/irqbalance/
GNU General Public License v2.0
586 stars 139 forks source link

irqbalance doesn't seem to work (on RT3200) #245

Closed Imwithstupid007 closed 2 years ago

Imwithstupid007 commented 2 years ago

Irqbalance doesn't seem to be working on RT3200. See 2 cases on 2 different versions (snapshot r21070and 22.03.1):

https://forum.openwrt.org/t/belkin-rt3200-linksys-e8450-wifi-ax-discussion/94302/3084 https://forum.openwrt.org/t/belkin-rt3200-linksys-e8450-wifi-ax-discussion/94302/3087

In my case (r21070) I reinstalled irqbalance but nothing changed. Irqbalance is enabled and running.

config irqbalance 'irqbalance' option enabled '1'

# The default value is 10 seconds
#option interval '10'

# List of IRQ's to ignore
#list banirq '36'
#list banirq '69'

image

/etc/config$ cat /proc/interrupts CPU0 CPU1
10: 16050554 10986302 GICv2 30 Level arch_timer 15: 1 0 MT_SYSIRQ 163 Level mt-pmic-pwrap 22: 0 0 mt-eint 0 Edge gpio-keys 75: 6 0 mt-eint 53 Level mt7530 124: 0 0 mt-eint 102 Edge gpio-keys 125: 14 0 MT_SYSIRQ 91 Level ttyS0 128: 0 0 MT_SYSIRQ 118 Level 1100a000.spi 131: 11569 0 MT_SYSIRQ 96 Level mtk-snand 132: 41769 0 MT_SYSIRQ 95 Level mtk-ecc 133: 0 0 MT_SYSIRQ 122 Level 11016000.spi 134: 50962650 0 MT_SYSIRQ 211 Level mt7615e 135: 0 0 MT_SYSIRQ 232 Level xhci-hcd:usb1 138: 0 0 MT_SYSIRQ 219 Level 1b007000.dma-controller 142: 31708071 0 MT_SYSIRQ 224 Level 1b100000.ethernet 143: 34515013 0 MT_SYSIRQ 225 Level 1b100000.ethernet 146: 0 0 dummy 0 Edge PCIe PME 147: 0 3 mt7530 0 Edge mt7530-0:00 148: 0 0 mt7530 1 Edge mt7530-0:01 149: 0 1 mt7530 2 Edge mt7530-0:02 150: 0 1 mt7530 3 Edge mt7530-0:03 151: 0 1 mt7530 4 Edge mt7530-0:04 152: 21131956 0 MTK PCIe MSI 524288 Edge mt7915e IPI0: 2446447 2590160 Rescheduling interrupts IPI1: 31955999 84975613 Function call interrupts IPI2: 0 0 CPU stop interrupts IPI3: 0 0 CPU stop (for crash dump) interrupts IPI4: 0 0 Timer broadcast interrupts IPI5: 16104803 3809938 IRQ work interrupts IPI6: 0 0 CPU wake-up interrupts Err: 0

liuchao173 commented 2 years ago

Please try this: cat /proc/irq/134/affinity_hint cat /proc/irq/142/affinity_hint cat /proc/irq/143/affinity_hint

and run irqbalance with '-d'. For example: /usr/sbin/irqbalance -f -c 2 -t 10 -d

Imwithstupid007 commented 2 years ago

Lets see if I did this correct:

  1. run the 3 "affinity_hint" lines in Putty
  2. Stopped irqbalance in Luci
  3. Started /usr/sbin/irqbalance -f -c 2 -t 10 -d in Putty

I now get this output in Putty:

Package 0: numa_node -1 cpu mask is 00000003 (load 400000000) Cache domain 0: numa_node is -1 cpu mask is 00000003 (load 400000000) CPU number 1 numa_node is -1 (load 150000000) CPU number 0 numa_node is -1 (load 250000000) Interrupt 146 node_num is -1 (legacy/1:0) Interrupt 124 node_num is -1 (other/1:0) Interrupt 75 node_num is -1 (other/1:0) Interrupt 22 node_num is -1 (other/1:0) Interrupt 15 node_num is -1 (other/1:0) Interrupt 10 node_num is -1 (other/51899502:1574) Interrupt 152 node_num is -1 (other/9858927:299) Interrupt 134 node_num is -1 (other/159358509:4833) Interrupt 133 node_num is -1 (other/1:0) Interrupt 132 node_num is -1 (other/1:0) Interrupt 131 node_num is -1 (other/1:0) Interrupt 128 node_num is -1 (other/1:0) Interrupt 125 node_num is -1 (other/1:0) Interrupt 148 node_num is -1 (other/1:0) Interrupt 147 node_num is -1 (other/1:0) Interrupt 143 node_num is -1 (other/84048177:2549) Interrupt 142 node_num is -1 (other/94830348:2876) Interrupt 138 node_num is -1 (other/1:0) Interrupt 135 node_num is -1 (other/1:0) Interrupt 151 node_num is -1 (other/1:0) Interrupt 150 node_num is -1 (other/1:0) Interrupt 149 node_num is -1 (other/1:0)

Still get:

/proc/irq/142$ cat /proc/interrupts CPU0 CPU1
10: 16873344 11577097 GICv2 30 Level arch_timer 15: 1 0 MT_SYSIRQ 163 Level mt-pmic-pwrap 22: 0 0 mt-eint 0 Edge gpio-keys 75: 6 0 mt-eint 53 Level mt7530 124: 0 0 mt-eint 102 Edge gpio-keys 125: 14 0 MT_SYSIRQ 91 Level ttyS0 128: 0 0 MT_SYSIRQ 118 Level 1100a000.spi 131: 11604 0 MT_SYSIRQ 96 Level mtk-snand 132: 41909 0 MT_SYSIRQ 95 Level mtk-ecc 133: 0 0 MT_SYSIRQ 122 Level 11016000.spi 134: 54480658 0 MT_SYSIRQ 211 Level mt7615e 135: 0 0 MT_SYSIRQ 232 Level xhci-hcd:usb1 138: 0 0 MT_SYSIRQ 219 Level 1b007000.dma-controller 142: 34782516 0 MT_SYSIRQ 224 Level 1b100000.ethernet 143: 37382131 0 MT_SYSIRQ 225 Level 1b100000.ethernet 146: 0 0 dummy 0 Edge PCIe PME 147: 0 3 mt7530 0 Edge mt7530-0:00 148: 0 0 mt7530 1 Edge mt7530-0:01 149: 0 1 mt7530 2 Edge mt7530-0:02 150: 0 1 mt7530 3 Edge mt7530-0:03 151: 0 1 mt7530 4 Edge mt7530-0:04 152: 21413702 0 MTK PCIe MSI 524288 Edge mt7915e IPI0: 2693026 2715427 Rescheduling interrupts IPI1: 34566316 90756277 Function call interrupts IPI2: 0 0 CPU stop interrupts IPI3: 0 0 CPU stop (for crash dump) interrupts IPI4: 0 0 Timer broadcast interrupts IPI5: 16699684 3989710 IRQ work interrupts IPI6: 0 0 CPU wake-up interrupts Err: 0

Imwithstupid007 commented 2 years ago

All 3 "affinity_hint" lines resulted in 0

nhorman commented 2 years ago

@Imwithstupid007 looking at your /proc/interrupts output, irqbalance seems to be running as expected. For each affine-able interrupt (i.e. the non-ipi or timer irqs), you're seeing irq handling happening on only one of your two cpus, which is expected, as irqbalance balances interrupt load by measuring the softirq time spent for each interrupt and balancing that time across all your cpus.

You may have been expecting irqbalance to place an equal number of irq events on each cpu, but thats not how any of this works. There are several good articles online about how irqbalance measures and manages interrupts. There is also this which focuses on use in openwrt, which I believe you are using

Imwithstupid007 commented 2 years ago

@nhorman You're correct that i'm using Openwrt. Your link link lead me to this page. Apparently something changed very recent. In one of the earlier snapshots the interrrupts 142 and 143 were spread over the 2 cores.

nhorman commented 2 years ago

That would have been a bug. High volume interrupts like ethernet devices should be isolated to a single core. What version were you running previously

Imwithstupid007 commented 2 years ago

Coming from a Netgear R7800 and then to a Belkin RT3200. On both devices i ran several stable and snapshot Openwrt releases. Both devices spread the interrupts across both cores as you can see in this link (for R7800) https://openwrt.org/docs/guide-user/services/irqbalance I know for certain that the RT3200 this also did up until one of the recent snapshots from a week or 3 ago (i think r209xx). Something changed in one of the latest snapshots and stable (for the RT3200?) which caused it to restrict the interrupts to one core.

nhorman commented 2 years ago

I'm asking what version of irqbalance they ran

Imwithstupid007 commented 2 years ago

Now i'm running image

nhorman commented 2 years ago

I thought I was clear - What versions of irqbalance were the prior openwrt images running when they were spreading irqs across cpus.

Imwithstupid007 commented 2 years ago

As far as i can see and know probably 1.9.0-4. But mind you i'm a noob.

nhorman commented 2 years ago

there are only 23 changes between those two versions, 8 commits of which are thermal event features that don't get built for non-x86_64 platforms, another 8 of which are fixes to the UI application. Theres one fix to interrupt parsing (0a82dddbaf5702caded0d0d83a6eafaca743254d) that may be relevant here, but regardless, the way it working now is the way its supposed to be working

Imwithstupid007 commented 2 years ago

OK, thanks for your trouble.