Replaces the rescaling algorithm for Complex division to one inspired by Doug Priest's "Efficient Scaling for Complex Division," with some further tweaks to:
allow it to work for arbitrary FloatingPoint types, including Float16
get exactly the same rounding behavior as the un-rescaled path, so that z/w = tz/tw when tz and tw are computed exactly.
allow future optimizations to hoist a rescaled reciprocal for more speedups.
Unlike Priest, we do not try to avoid spurious overflow in the final computation when the result is very near the overflow boundary but cancellation brings us just inside it. We do not believe that this is a good tradeoff, as complex multiplication overflows in exactly the same way. We will investigate providing opt-in API to avoid this overflow case in a future PR.
Replaces the rescaling algorithm for Complex division to one inspired by Doug Priest's "Efficient Scaling for Complex Division," with some further tweaks to:
Unlike Priest, we do not try to avoid spurious overflow in the final computation when the result is very near the overflow boundary but cancellation brings us just inside it. We do not believe that this is a good tradeoff, as complex multiplication overflows in exactly the same way. We will investigate providing opt-in API to avoid this overflow case in a future PR.