Closed qiji2023 closed 7 months ago
which mad instruction(GCN ISA/fma)? what are you trying to run? Can you provide a sample? What behavior were you expecting?
@cjatin
on RDNA
v_mad_u64_u32, why doesn't it carry in
?
for example:
a1, c1(carry out) = b1 + d1 +c0(carry in)
a2, c2(carry out) = b2 + d2 + c1(carry in)
not a hip issue
this issue is better raised against amdgpu backend on llvm-project github https://github.com/llvm/llvm-project/issues in the hope that the hardware limitation may be reported to right channel.
Based on ISA manual
the instruction does not have carry in. this may be a design decision due to cost/performance balance. since support carry in with mad will increase its footage on chip design, but you can still use add/multiply to handle need for carry in.
how much performance gain can be achieved if there is carry in in mad instruction?
this issue is better raised against amdgpu backend on llvm-project github https://github.com/llvm/llvm-project/issues in the hope that the hardware limitation may be reported to right channel.
Based on ISA manual
the instruction does not have carry in. this may be a design decision due to cost/performance balance. since support carry in with mad will increase its footage on chip design, but you can still use add/multiply to handle need for carry in.
how much performance gain can be achieved if there is carry in in mad instruction?
For big integer multiply.
If I use the current instruction, I need mad
-> add
-> mad
-> ... it is 6 clk for single operation
but if mad instruction have carry in, I only need mad
->mad
-> ... it is only 4 clk for single operation
I can gain at least 50% performance improvement.
this is very rediculous!!!!!!!!!!!!!!!!!