Closed StefanBruens closed 8 months ago
I don't see how this helps your description.
every iteration, each info node should be visited only once in activate_mapping, and after its done, info->moved is already set back to 0, the only thing you've changed is that it gets set back to zero on a check for an invalid mask (which occurs in the event cpus get taken offline unexpectedly). By setting moved = 0 here you're preventing irqs from being migrated when that cpu comes back online.
I don't see how this helps your description.
every iteration, each info node should be visited only once in activate_mapping, and after its done, info->moved is already set back to 0, the only thing you've changed is that it gets set back to zero on a check for an invalid mask (which occurs in the event cpus get taken offline unexpectedly). By setting moved = 0 here you're preventing irqs from being migrated when that cpu comes back online.
info->moved is only unset iff a value is written to smp_affinity, but if the current affinity as read from the proc file is the same as the wanted one, moved is not reset to 0.
Maybe the comment there is a problem, check_affinity
does not verify mask validity.
As far as I can see, as soon as a CPU comes online moved
will be set again, by force_rebalance_irq
:
https://github.com/Irqbalance/irqbalance/blob/b0152bc4e8c8464294f089351cf7a19afe6987e2/irqbalance.c#L317
Without the change, the proc file for all IRQs is read again and again, with the change only the proc files for IRQs which need migration are read and eventually written. This can be trivially verified with strace -eopen,read ...
.
Actually, there seems to be another bug here:
cpu_and(dst, src1, src2)
may set the applied_mask
to zero. This should be checked with if (cpus_empty(applied_mask)) { return; }
.
But check_affinity
does not cover this case, as the mask read from the proc file is always non-zero, i.e. cpus_equal
and thus check_affinity
will return 0, and the code below will try to write a zero value to the proc file.
Maybe the comment there is a problem,
check_affinity
does not verify mask validity.
The comment was "broken" with https://github.com/Irqbalance/irqbalance/commit/cfe3d10d37e44be4bb94cb715dbe6fde9e8a60ff
That removed the "valid_mask" condition from the following if (...)
.
An IRQ may be migrated several times during one loop iteration, and end up with the same CPU affinity mask as before - the "moved" flag is merely a hint a affinity change may be necessary. This notably also happens during initial placement, but may happen also anytime later.
To avoid visiting each IRQ again and again unset the "moved" flag. This avoids the open/stat/read/close syscalls for unchanged interrupts.
Fixes: #285