This PR adds explicit broadcasting of the alpha and beta values where necessary to make the first two statically defined vector registers available to store matrix elements when possible. This affects the generators for the hsw, arm, and arm_sve architectures .
For the knl generator, we can use implicit broadcasting in the mul and fmla instructions while skipping the broadcast instruction completely.
In order to keep the naming convention of the general registers that are used consistent, I added a partial mapping of r(n) to rax, rbx, etc. registers to hsw/operands.py and knl/operands.py. I ended up only defining r(3) = rbx and r(4) = rcx as a quick mapping, because mapping e.g. rdi to another input value for r(n) should not be allowed since rdi is used to store a memory address in the hsw and knl generator.
This PR adds explicit broadcasting of the alpha and beta values where necessary to make the first two statically defined vector registers available to store matrix elements when possible. This affects the generators for the
hsw
,arm
, andarm_sve
architectures . For theknl
generator, we can use implicit broadcasting in themul
andfmla
instructions while skipping the broadcast instruction completely.In order to keep the naming convention of the general registers that are used consistent, I added a partial mapping of
r(n)
torax
,rbx
, etc. registers tohsw/operands.py
andknl/operands.py
. I ended up only definingr(3) = rbx
andr(4) = rcx
as a quick mapping, because mapping e.g.rdi
to another input value forr(n)
should not be allowed sincerdi
is used to store a memory address in thehsw
andknl
generator.