Closed schuettecarsten closed 6 months ago
can you post the debug output here please?
It looks like this only happens when irqbalance runs in a procd "jail". If I try to redirect console output from irqbalance into a file, the issue is not reproduceable. If I remove the file, the issue comes back. Is it possible that there is a timing or caching issue in irqbalance startup code when checking for banned IRQs?
That....doesn't make any sense. When you say proc jail, I assume you are referring to a file namespace that bind mounts /proc? Are you by any chance running a second copy of irqbalance in the init namespace without the banning settings? If you need to stop other irqbalance services and run it once in a screen session with logging enabled, so you don't have to do the file redirection. But what you're describing definately sounds like you're running irqbalance twice with different settings for each instance, and they're competing with each other.
Also, what version of irqbalance are you running?
Yes, I know that this does not make sense, I have no idea what's going on, I can only see that the value of /proc/irq/27/smp_affinity and /proc/irq/28/smp_affinity sometimes switches vom f to 2 to 8 after irqbalance was started. It is reproduceable when I start irqbalance using the default script that is provided with OpenWRT.
What I can see is:
This command line works:
/usr/sbin/irqbalance -f -d -c 2 -t 10 -i 14 -i 27 -i 28 -i 30
Log output from startup:
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 14 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 27 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 28 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 30 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: This machine seems not NUMA capable.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Prevent irq assignment to these isolated CPUs: 00000000
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Prevent irq assignment to these adaptive-ticks CPUs: 00000000
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Banned CPUs: 00000000
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Package 0: numa_node -1 cpu mask is 0000000f (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Cache domain 0: numa_node is -1 cpu mask is 0000000f (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: CPU number 3 numa_node is -1 (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: CPU number 1 numa_node is -1 (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: CPU number 2 numa_node is -1 (load 0)
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: CPU number 0 numa_node is -1 (load 0)
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ arch_timer(11) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ fe00b880.mailbox(14) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ DMA IRQ(15) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ PCIe PME, aerdrv(26) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ eth0(27) guessed as class 5
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ eth0(28) guessed as class 5
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ xhci_hcd(29) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ mmc1, mmc0(30) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ VCHIQ doorbell(31) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.debug /usr/sbin/irqbalance: IRQ uart-pl011(32) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ arch_timer(11) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ fe00b880.mailbox(14) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ DMA IRQ(15) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ PCIe PME, aerdrv(26) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ eth0(27) guessed as class 5
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ eth0(28) guessed as class 5
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ xhci_hcd(29) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ mmc1, mmc0(30) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ VCHIQ doorbell(31) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ uart-pl011(32) guessed as class 0
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 29 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 11 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 14 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 15 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 26 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 27 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 28 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: IRQ 30 was BANNED.
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 31 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: Adding IRQ 32 to database
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: NUMA NODE NUMBER: -1
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]: LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,fffff
Wed Feb 28 18:21:33 2024 daemon.info irqbalance[25277]:
This command line causes IRQ bans to fail (two spaces after -t 10 and before first -i argument):
/usr/sbin/irqbalance -f -d -c 2 -t 10 -i 14 -i 27 -i 28 -i 30
Log output showing that IRQs are not banned:
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: This machine seems not NUMA capable.
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Prevent irq assignment to these isolated CPUs: 00000000
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Prevent irq assignment to these adaptive-ticks CPUs: 00000000
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Banned CPUs: 00000000
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Package 0: numa_node -1 cpu mask is 0000000f (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Cache domain 0: numa_node is -1 cpu mask is 0000000f (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: CPU number 3 numa_node is -1 (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: CPU number 1 numa_node is -1 (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: CPU number 2 numa_node is -1 (load 0)
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: CPU number 0 numa_node is -1 (load 0)
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ arch_timer(11) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ fe00b880.mailbox(14) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ DMA IRQ(15) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ PCIe PME, aerdrv(26) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ eth0(27) guessed as class 5
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ eth0(28) guessed as class 5
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ xhci_hcd(29) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ mmc1, mmc0(30) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ VCHIQ doorbell(31) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.debug /usr/sbin/irqbalance: IRQ uart-pl011(32) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ arch_timer(11) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ fe00b880.mailbox(14) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ DMA IRQ(15) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ PCIe PME, aerdrv(26) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ eth0(27) guessed as class 5
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ eth0(28) guessed as class 5
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ xhci_hcd(29) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ mmc1, mmc0(30) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ VCHIQ doorbell(31) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: IRQ uart-pl011(32) guessed as class 0
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 29 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 11 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 14 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 15 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 26 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 27 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 28 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 30 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 31 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: Adding IRQ 32 to database
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: NUMA NODE NUMBER: -1
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]: LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,fffff
Wed Feb 28 18:22:57 2024 daemon.info irqbalance[25726]:
so you're saying the difference is that the problem triggers when you add an extra space after a single command line argument, ok.
If I build the latest commit from the head of the tree, stop the system irqbalance and run:
./irqbalance -f -d -t 10 -i 27
Noting the 10 spaces after the -t 10 for good measure on my x86_64 system I get this output:
IRQ 27 was BANNED.
Prevent irq assignment to these isolated CPUs: 00000000
Prevent irq assignment to these adaptive-ticks CPUs: 00000000
Prevent irq assignment to these thermal-banned CPUs: 00000000
Banned CPUs: 00000000
Package 0: numa_node 0 cpu mask is 3ff003ff (load 0)
Cache domain 0: numa_node is 0 cpu mask is 00200002 (load 0)
CPU number 21 numa_node is 0 (load 0)
CPU number 1 numa_node is 0 (load 0)
Cache domain 2: numa_node is 0 cpu mask is 20000200 (load 0)
CPU number 9 numa_node is 0 (load 0)
CPU number 29 numa_node is 0 (load 0)
Cache domain 3: numa_node is 0 cpu mask is 08000080 (load 0)
CPU number 7 numa_node is 0 (load 0)
CPU number 27 numa_node is 0 (load 0)
Cache domain 5: numa_node is 0 cpu mask is 10000100 (load 0)
CPU number 28 numa_node is 0 (load 0)
CPU number 8 numa_node is 0 (load 0)
Cache domain 6: numa_node is 0 cpu mask is 02000020 (load 0)
CPU number 5 numa_node is 0 (load 0)
CPU number 25 numa_node is 0 (load 0)
Cache domain 8: numa_node is 0 cpu mask is 04000040 (load 0)
CPU number 26 numa_node is 0 (load 0)
CPU number 6 numa_node is 0 (load 0)
Cache domain 9: numa_node is 0 cpu mask is 00800008 (load 0)
CPU number 3 numa_node is 0 (load 0)
CPU number 23 numa_node is 0 (load 0)
Cache domain 11: numa_node is 0 cpu mask is 01000010 (load 0)
CPU number 24 numa_node is 0 (load 0)
CPU number 4 numa_node is 0 (load 0)
Cache domain 13: numa_node is 0 cpu mask is 00400004 (load 0)
CPU number 22 numa_node is 0 (load 0)
CPU number 2 numa_node is 0 (load 0)
Cache domain 15: numa_node is 0 cpu mask is 00100001 (load 0)
CPU number 20 numa_node is 0 (load 0)
CPU number 0 numa_node is 0 (load 0)
Package 1: numa_node 1 cpu mask is 000000ff,c00ffc00 (load 0)
Cache domain 1: numa_node is 1 cpu mask is 80000800 (load 0)
CPU number 11 numa_node is 1 (load 0)
CPU number 31 numa_node is 1 (load 0)
Cache domain 4: numa_node is 1 cpu mask is 00000040,00040000 (load 0)
CPU number 38 numa_node is 1 (load 0)
CPU number 18 numa_node is 1 (load 0)
Cache domain 7: numa_node is 1 cpu mask is 00000010,00010000 (load 0)
CPU number 36 numa_node is 1 (load 0)
CPU number 16 numa_node is 1 (load 0)
Cache domain 10: numa_node is 1 cpu mask is 00000004,00004000 (load 0)
CPU number 34 numa_node is 1 (load 0)
CPU number 14 numa_node is 1 (load 0)
Cache domain 12: numa_node is 1 cpu mask is 00000001,00001000 (load 0)
CPU number 32 numa_node is 1 (load 0)
CPU number 12 numa_node is 1 (load 0)
Cache domain 14: numa_node is 1 cpu mask is 40000400 (load 0)
CPU number 30 numa_node is 1 (load 0)
CPU number 10 numa_node is 1 (load 0)
Cache domain 16: numa_node is 1 cpu mask is 00000080,00080000 (load 0)
CPU number 39 numa_node is 1 (load 0)
CPU number 19 numa_node is 1 (load 0)
Cache domain 17: numa_node is 1 cpu mask is 00000020,00020000 (load 0)
CPU number 37 numa_node is 1 (load 0)
CPU number 17 numa_node is 1 (load 0)
Cache domain 18: numa_node is 1 cpu mask is 00000008,00008000 (load 0)
CPU number 35 numa_node is 1 (load 0)
CPU number 15 numa_node is 1 (load 0)
Cache domain 19: numa_node is 1 cpu mask is 00000002,00002000 (load 0)
CPU number 33 numa_node is 1 (load 0)
CPU number 13 numa_node is 1 (load 0)
Adding IRQ 39 to database
Adding IRQ 29 to database
Adding IRQ 65 to database
Adding IRQ 35 to database
Adding IRQ 25 to database
Adding IRQ 63 to database
Adding IRQ 55 to database
Adding IRQ 53 to database
Adding IRQ 61 to database
Adding IRQ 51 to database
Adding IRQ 58 to database
Adding IRQ 48 to database
Adding IRQ 56 to database
Adding IRQ 54 to database
Adding IRQ 62 to database
Adding IRQ 52 to database
Adding IRQ 60 to database
Adding IRQ 50 to database
Adding IRQ 59 to database
Adding IRQ 49 to database
Adding IRQ 57 to database
Adding IRQ 47 to database
Adding IRQ 64 to database
Adding IRQ 30 to database
Adding IRQ 33 to database
Adding IRQ 45 to database
Adding IRQ 18 to database
Adding IRQ 17 to database
Adding IRQ 28 to database
Adding IRQ 26 to database
Adding IRQ 38 to database
Adding IRQ 37 to database
Adding IRQ 43 to database
Adding IRQ 41 to database
Adding IRQ 44 to database
Adding IRQ 42 to database
Adding IRQ 40 to database
Adding IRQ 32 to database
Adding IRQ 36 to database
Adding IRQ 31 to database
Adding IRQ 34 to database
IRQ 27 was BANNED.
Adding IRQ 19 to database
...
That suggests to me that there is something wrong with either the source that the openwrt irqbalance is built from, or something wrong with the startup script. It appears that openwrt ships irqbalcne 1.9.2, is that what you're running? Thats fairly recent, so I would suspect theres no problem with the source, or it would likely have reproduced here.
I'd suggest running the daemon under a debugger, breaking in main to see if the command line argument itself is getting properly parsed.
I've created a PR at OpenWRT to fix the init script, hopefully it will get merged. Thank you for you quick help and sorry for bothering you with this strange issue.
P.S.: OpenWRT uses 1.9.3
No worries, what was the problem with their script?
This change fixes the issue for me. No idea why... https://github.com/openwrt/packages/commit/40cc7931d56cb0620c5b38304d3b4ff2b8bb75cd
I use -i parameter to exclude (ban) IRQs from irqbalance. Platform is an rpi4 wit arm64. On startup, irqbalance updates the smp_affinity for these banned IRQs also. I would expect that irqbalance will not touch banned IRQs at all.
Before irqbalance start:
irqbalance start using
/usr/sbin/irqbalance -f -c 2 -t 10 -i 14 -i 27 -i 28 -i 30
shows:But after some seconds, smp_affinity is modified for two of the banned IRQs:
There is no useful output about these IRQs from irqbalance, not even with -d flag.