Irqbalance / irqbalance

The irqbalance source tree - The new official site for irqbalance
http://irqbalance.github.io/irqbalance/
GNU General Public License v2.0
576 stars 139 forks source link

Try fix https://github.com/Irqbalance/irqbalance/issues/303 #312

Closed balrog-kun closed 4 months ago

balrog-kun commented 4 months ago

This is a proposed fix for https://github.com/Irqbalance/irqbalance/issues/303 up for discussion. Here's the relevant commit msg:


There are situations where irqbalance may try to migrate large numbers of IRQs to a topo_obj, there's no upper bound on the number as the placement logic is based on load mainly. The kernel's irq bitmasks limit the number of IRQs on each cpu and if more are tried to be migrated, the write to smp_affinity returns -ENOSPC. This confuses irqbalance's logic, the topo_obj.interrupts list no longer matches the irqs actually on that CPU or cache domain, and results in floods of error messages. See https://github.com/Irqbalance/irqbalance/issues/303 for details.

For an easy fix, track the number of IRQ slots still free on each CPU. We start with INT_MAX meaning "unknown" and when we first get a -ENOSPC, we know we have no slots left. From there update the slots count each time we migrate IRQs to/from the CPU core topo_obj. We may never see an -ENOSPC and in that case there's no change in current logic, we never start tracking.

This way we don't need to know ahead of time how many slots the kernel has for each CPU. The number may be arch specific (it is about 200 on x86-64) and is dependent on the number managed IRQs kernel has registered, so we don't want to guess. This is also more tolerant to the topo_obj.interrupts lists not matching exactly the kernel's idea of each irq's current affinity, e.g. due to -EIO errors in the smp_affinity writes.

For now only do the tracking at OBJ_TYPE_CPU level so we don't have to update slots_left for all parent objs.

Th commit doesn't try to stop an ongoing activation of all the IRQs already scheduled for moving to one cpu, when that cpu starts returning ENOSPC. We'll still see a bunch of those errors in that iteration. But in subsequent calculate_placement() iterations we avoid assigning more IRQs to that cpu than we were able to successfully move before.