In some situations like int8 dot product, we want to accumulate into a higher bitwidth accumulator, but how do we go about supporting this in a sane and logical way? Currently, the system is very simple T in == T out but if we want to start doing say accumulate to u32 then this becomes considerably harder...
In some situations like int8 dot product, we want to accumulate into a higher bitwidth accumulator, but how do we go about supporting this in a sane and logical way? Currently, the system is very simple
T in
==T out
but if we want to start doing sayaccumulate to u32
then this becomes considerably harder...