Irqbalance / irqbalance

The irqbalance source tree - The new official site for irqbalance
http://irqbalance.github.io/irqbalance/
GNU General Public License v2.0
576 stars 139 forks source link

Avoid repeated affinity checks when no change is necessary #286

Closed StefanBruens closed 8 months ago

StefanBruens commented 9 months ago

An IRQ may be migrated several times during one loop iteration, and end up with the same CPU affinity mask as before - the "moved" flag is merely a hint a affinity change may be necessary. This notably also happens during initial placement, but may happen also anytime later.

To avoid visiting each IRQ again and again unset the "moved" flag. This avoids the open/stat/read/close syscalls for unchanged interrupts.

Fixes: #285

nhorman commented 9 months ago

I don't see how this helps your description.

every iteration, each info node should be visited only once in activate_mapping, and after its done, info->moved is already set back to 0, the only thing you've changed is that it gets set back to zero on a check for an invalid mask (which occurs in the event cpus get taken offline unexpectedly). By setting moved = 0 here you're preventing irqs from being migrated when that cpu comes back online.

StefanBruens commented 9 months ago

I don't see how this helps your description.

every iteration, each info node should be visited only once in activate_mapping, and after its done, info->moved is already set back to 0, the only thing you've changed is that it gets set back to zero on a check for an invalid mask (which occurs in the event cpus get taken offline unexpectedly). By setting moved = 0 here you're preventing irqs from being migrated when that cpu comes back online.

info->moved is only unset iff a value is written to smp_affinity, but if the current affinity as read from the proc file is the same as the wanted one, moved is not reset to 0.

Maybe the comment there is a problem, check_affinity does not verify mask validity.

As far as I can see, as soon as a CPU comes online moved will be set again, by force_rebalance_irq: https://github.com/Irqbalance/irqbalance/blob/b0152bc4e8c8464294f089351cf7a19afe6987e2/irqbalance.c#L317

Without the change, the proc file for all IRQs is read again and again, with the change only the proc files for IRQs which need migration are read and eventually written. This can be trivially verified with strace -eopen,read ....

StefanBruens commented 9 months ago

Actually, there seems to be another bug here:

https://github.com/Irqbalance/irqbalance/blob/b0152bc4e8c8464294f089351cf7a19afe6987e2/activate.c#L66

cpu_and(dst, src1, src2) may set the applied_mask to zero. This should be checked with if (cpus_empty(applied_mask)) { return; }.

But check_affinity does not cover this case, as the mask read from the proc file is always non-zero, i.e. cpus_equal and thus check_affinity will return 0, and the code below will try to write a zero value to the proc file.

StefanBruens commented 8 months ago

Maybe the comment there is a problem, check_affinity does not verify mask validity.

The comment was "broken" with https://github.com/Irqbalance/irqbalance/commit/cfe3d10d37e44be4bb94cb715dbe6fde9e8a60ff

That removed the "valid_mask" condition from the following if (...).