Right now we do a swap if necessary so that LHS is always greater than RHS. Can this be skipped and we use references in the xx_add_impl like in xx_sub_impl?
decimal32 shows a couple percent speedup by implementing this. I expect that as the swap operation gets more expensive we should continue to see better increases. The new impl also has 2 fewer branches.
Right now we do a swap if necessary so that LHS is always greater than RHS. Can this be skipped and we use references in the xx_add_impl like in xx_sub_impl?