Open fredrik-johansson opened 2 months ago
Hmm, not all functions allow aliasing. Whenever there is a loop involved, I believe it does not support aliasing (or it may do up to some specific iteration).
Generally, I think it is a bad idea performance-wise to try to allow aliasing for low-level multiplication functions.
The point is that for small n, the hardcoded functions do support aliasing, so there is no performance hit on the mpn
level. But functions like fmpz_mul
and arf_mul
currently waste time comparing pointers, allocating temp space and copying data since they don't know about this.
Similarly, functions like fmpz_mul
waste memory handling aliasing for FFT-size operands.
I don't recall if the Arm semi-hardcoded multiplication routines allow for aliasing, but I'm pretty sure there is ranges there where aliasing is not allowed.
The hardcoded assembly versions do so, the FFT versions ought to do so automatically, and at intermediate sizes we could probably afford to stick in some pointer comparison and allocate temporary memory when needed.