Irqbalance / irqbalance

The irqbalance source tree - The new official site for irqbalance
http://irqbalance.github.io/irqbalance/
GNU General Public License v2.0
576 stars 139 forks source link

irqbalance updates smp_affinity for ignored IRQs #297

Closed schuettecarsten closed 6 months ago

schuettecarsten commented 6 months ago

I use -i parameter to exclude (ban) IRQs from irqbalance. Platform is an rpi4 wit arm64. On startup, irqbalance updates the smp_affinity for these banned IRQs also. I would expect that irqbalance will not touch banned IRQs at all.

Before irqbalance start:

/proc/irq/27/smp_affinity:f
/proc/irq/28/smp_affinity:f

irqbalance start using /usr/sbin/irqbalance -f -c 2 -t 10 -i 14 -i 27 -i 28 -i 30 shows:

root@rpi4:~# /usr/sbin/irqbalance -f -c 2 -t 10  -i 14 -i 27 -i 28 -i 30
IRQ 14 was BANNED.
IRQ 27 was BANNED.
IRQ 28 was BANNED.
IRQ 30 was BANNED.

But after some seconds, smp_affinity is modified for two of the banned IRQs:

/proc/irq/27/smp_affinity:2
/proc/irq/28/smp_affinity:8

There is no useful output about these IRQs from irqbalance, not even with -d flag.

nhorman commented 6 months ago

can you post the debug output here please?

schuettecarsten commented 6 months ago

It looks like this only happens when irqbalance runs in a procd "jail". If I try to redirect console output from irqbalance into a file, the issue is not reproduceable. If I remove the file, the issue comes back. Is it possible that there is a timing or caching issue in irqbalance startup code when checking for banned IRQs?

nhorman commented 6 months ago

That....doesn't make any sense. When you say proc jail, I assume you are referring to a file namespace that bind mounts /proc? Are you by any chance running a second copy of irqbalance in the init namespace without the banning settings? If you need to stop other irqbalance services and run it once in a screen session with logging enabled, so you don't have to do the file redirection. But what you're describing definately sounds like you're running irqbalance twice with different settings for each instance, and they're competing with each other.

Also, what version of irqbalance are you running?

schuettecarsten commented 6 months ago

Yes, I know that this does not make sense, I have no idea what's going on, I can only see that the value of /proc/irq/27/smp_affinity and /proc/irq/28/smp_affinity sometimes switches vom f to 2 to 8 after irqbalance was started. It is reproduceable when I start irqbalance using the default script that is provided with OpenWRT.

schuettecarsten commented 6 months ago

What I can see is:

This command line works:

/usr/sbin/irqbalance -f -d -c 2 -t 10 -i 14 -i 27 -i 28 -i 30

Log output from startup:

Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 14 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 27 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 28 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 30 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: This machine seems not NUMA capable.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Prevent irq assignment to these isolated CPUs: 00000000
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Prevent irq assignment to these adaptive-ticks CPUs: 00000000
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Banned CPUs: 00000000
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Package 0:  numa_node -1 cpu mask is 0000000f (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]:         Cache domain 0:  numa_node is -1 cpu mask is 0000000f  (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]:                 CPU number 3  numa_node is -1 (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]:                 CPU number 1  numa_node is -1 (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]:                 CPU number 2  numa_node is -1 (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]:                 CPU number 0  numa_node is -1 (load 0)
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ arch_timer(11) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ fe00b880.mailbox(14) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ DMA IRQ(15) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ PCIe PME, aerdrv(26) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ eth0(27) guessed as class 5
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ eth0(28) guessed as class 5
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ xhci_hcd(29) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ mmc1, mmc0(30) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ VCHIQ doorbell(31) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ uart-pl011(32) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ arch_timer(11) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ fe00b880.mailbox(14) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ DMA IRQ(15) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ PCIe PME, aerdrv(26) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ eth0(27) guessed as class 5
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ eth0(28) guessed as class 5
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ xhci_hcd(29) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ mmc1, mmc0(30) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ VCHIQ doorbell(31) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ uart-pl011(32) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 29 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 11 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 14 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 15 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 26 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 27 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 28 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 30 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 31 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 32 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: NUMA NODE NUMBER: -1
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,fffff
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]:

This command line causes IRQ bans to fail (two spaces after -t 10 and before first -i argument):

/usr/sbin/irqbalance -f -d -c 2 -t 10  -i 14 -i 27 -i 28 -i 30

Log output showing that IRQs are not banned:

Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: This machine seems not NUMA capable.
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Prevent irq assignment to these isolated CPUs: 00000000
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Prevent irq assignment to these adaptive-ticks CPUs: 00000000
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Banned CPUs: 00000000
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Package 0:  numa_node -1 cpu mask is 0000000f (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]:         Cache domain 0:  numa_node is -1 cpu mask is 0000000f  (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]:                 CPU number 3  numa_node is -1 (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]:                 CPU number 1  numa_node is -1 (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]:                 CPU number 2  numa_node is -1 (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]:                 CPU number 0  numa_node is -1 (load 0)
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ arch_timer(11) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ fe00b880.mailbox(14) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ DMA IRQ(15) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ PCIe PME, aerdrv(26) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ eth0(27) guessed as class 5
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ eth0(28) guessed as class 5
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ xhci_hcd(29) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ mmc1, mmc0(30) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ VCHIQ doorbell(31) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ uart-pl011(32) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ arch_timer(11) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ fe00b880.mailbox(14) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ DMA IRQ(15) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ PCIe PME, aerdrv(26) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ eth0(27) guessed as class 5
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ eth0(28) guessed as class 5
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ xhci_hcd(29) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ mmc1, mmc0(30) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ VCHIQ doorbell(31) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ uart-pl011(32) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 29 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 11 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 14 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 15 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 26 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 27 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 28 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 30 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 31 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 32 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: NUMA NODE NUMBER: -1
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,fffff
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]:
nhorman commented 6 months ago

so you're saying the difference is that the problem triggers when you add an extra space after a single command line argument, ok.

If I build the latest commit from the head of the tree, stop the system irqbalance and run:

./irqbalance -f -d -t 10        -i 27

Noting the 10 spaces after the -t 10 for good measure on my x86_64 system I get this output:

IRQ 27 was BANNED.
Prevent irq assignment to these isolated CPUs: 00000000
Prevent irq assignment to these adaptive-ticks CPUs: 00000000
Prevent irq assignment to these thermal-banned CPUs: 00000000
Banned CPUs: 00000000
Package 0:  numa_node 0 cpu mask is 3ff003ff (load 0)
        Cache domain 0:  numa_node is 0 cpu mask is 00200002  (load 0) 
                CPU number 21  numa_node is 0 (load 0)
                CPU number 1  numa_node is 0 (load 0)
        Cache domain 2:  numa_node is 0 cpu mask is 20000200  (load 0) 
                CPU number 9  numa_node is 0 (load 0)
                CPU number 29  numa_node is 0 (load 0)
        Cache domain 3:  numa_node is 0 cpu mask is 08000080  (load 0) 
                CPU number 7  numa_node is 0 (load 0)
                CPU number 27  numa_node is 0 (load 0)
        Cache domain 5:  numa_node is 0 cpu mask is 10000100  (load 0) 
                CPU number 28  numa_node is 0 (load 0)
                CPU number 8  numa_node is 0 (load 0)
        Cache domain 6:  numa_node is 0 cpu mask is 02000020  (load 0) 
                CPU number 5  numa_node is 0 (load 0)
                CPU number 25  numa_node is 0 (load 0)
        Cache domain 8:  numa_node is 0 cpu mask is 04000040  (load 0) 
                CPU number 26  numa_node is 0 (load 0)
                CPU number 6  numa_node is 0 (load 0)
        Cache domain 9:  numa_node is 0 cpu mask is 00800008  (load 0) 
                CPU number 3  numa_node is 0 (load 0)
                CPU number 23  numa_node is 0 (load 0)
        Cache domain 11:  numa_node is 0 cpu mask is 01000010  (load 0) 
                CPU number 24  numa_node is 0 (load 0)
                CPU number 4  numa_node is 0 (load 0)
        Cache domain 13:  numa_node is 0 cpu mask is 00400004  (load 0) 
                CPU number 22  numa_node is 0 (load 0)
                CPU number 2  numa_node is 0 (load 0)
        Cache domain 15:  numa_node is 0 cpu mask is 00100001  (load 0) 
                CPU number 20  numa_node is 0 (load 0)
                CPU number 0  numa_node is 0 (load 0)
Package 1:  numa_node 1 cpu mask is 000000ff,c00ffc00 (load 0)
        Cache domain 1:  numa_node is 1 cpu mask is 80000800  (load 0) 
                CPU number 11  numa_node is 1 (load 0)
                CPU number 31  numa_node is 1 (load 0)
        Cache domain 4:  numa_node is 1 cpu mask is 00000040,00040000  (load 0) 
                CPU number 38  numa_node is 1 (load 0)
                CPU number 18  numa_node is 1 (load 0)
        Cache domain 7:  numa_node is 1 cpu mask is 00000010,00010000  (load 0) 
                CPU number 36  numa_node is 1 (load 0)
                CPU number 16  numa_node is 1 (load 0)
        Cache domain 10:  numa_node is 1 cpu mask is 00000004,00004000  (load 0) 
                CPU number 34  numa_node is 1 (load 0)
                CPU number 14  numa_node is 1 (load 0)
        Cache domain 12:  numa_node is 1 cpu mask is 00000001,00001000  (load 0) 
                CPU number 32  numa_node is 1 (load 0)
                CPU number 12  numa_node is 1 (load 0)
        Cache domain 14:  numa_node is 1 cpu mask is 40000400  (load 0) 
                CPU number 30  numa_node is 1 (load 0)
                CPU number 10  numa_node is 1 (load 0)
        Cache domain 16:  numa_node is 1 cpu mask is 00000080,00080000  (load 0) 
                CPU number 39  numa_node is 1 (load 0)
                CPU number 19  numa_node is 1 (load 0)
        Cache domain 17:  numa_node is 1 cpu mask is 00000020,00020000  (load 0) 
                CPU number 37  numa_node is 1 (load 0)
                CPU number 17  numa_node is 1 (load 0)
        Cache domain 18:  numa_node is 1 cpu mask is 00000008,00008000  (load 0) 
                CPU number 35  numa_node is 1 (load 0)
                CPU number 15  numa_node is 1 (load 0)
        Cache domain 19:  numa_node is 1 cpu mask is 00000002,00002000  (load 0) 
                CPU number 33  numa_node is 1 (load 0)
                CPU number 13  numa_node is 1 (load 0)
Adding IRQ 39 to database
Adding IRQ 29 to database
Adding IRQ 65 to database
Adding IRQ 35 to database
Adding IRQ 25 to database
Adding IRQ 63 to database
Adding IRQ 55 to database
Adding IRQ 53 to database
Adding IRQ 61 to database
Adding IRQ 51 to database
Adding IRQ 58 to database
Adding IRQ 48 to database
Adding IRQ 56 to database
Adding IRQ 54 to database
Adding IRQ 62 to database
Adding IRQ 52 to database
Adding IRQ 60 to database
Adding IRQ 50 to database
Adding IRQ 59 to database
Adding IRQ 49 to database
Adding IRQ 57 to database
Adding IRQ 47 to database
Adding IRQ 64 to database
Adding IRQ 30 to database
Adding IRQ 33 to database
Adding IRQ 45 to database
Adding IRQ 18 to database
Adding IRQ 17 to database
Adding IRQ 28 to database
Adding IRQ 26 to database
Adding IRQ 38 to database
Adding IRQ 37 to database
Adding IRQ 43 to database
Adding IRQ 41 to database
Adding IRQ 44 to database
Adding IRQ 42 to database
Adding IRQ 40 to database
Adding IRQ 32 to database
Adding IRQ 36 to database
Adding IRQ 31 to database
Adding IRQ 34 to database
IRQ 27 was BANNED.
Adding IRQ 19 to database
...

That suggests to me that there is something wrong with either the source that the openwrt irqbalance is built from, or something wrong with the startup script. It appears that openwrt ships irqbalcne 1.9.2, is that what you're running? Thats fairly recent, so I would suspect theres no problem with the source, or it would likely have reproduced here.

I'd suggest running the daemon under a debugger, breaking in main to see if the command line argument itself is getting properly parsed.

schuettecarsten commented 6 months ago

I've created a PR at OpenWRT to fix the init script, hopefully it will get merged. Thank you for you quick help and sorry for bothering you with this strange issue.

schuettecarsten commented 6 months ago

P.S.: OpenWRT uses 1.9.3

nhorman commented 6 months ago

No worries, what was the problem with their script?

schuettecarsten commented 6 months ago

This change fixes the issue for me. No idea why... https://github.com/openwrt/packages/commit/40cc7931d56cb0620c5b38304d3b4ff2b8bb75cd