Closed mcroomp closed 10 months ago
Also for 256 bit reductions, first reduce the top and bottom halves in parallel for generating better code. Use wrapping_add instead of sum to avoid blowups in debug builds since wrapping operations are assumed everywhere in this library.
Also for 256 bit reductions, first reduce the top and bottom halves in parallel for generating better code. Use wrapping_add instead of sum to avoid blowups in debug builds since wrapping operations are assumed everywhere in this library.