Open Janmajayamall opened 1 year ago
Benchmarks of elwise_fma_mod_2d
(a function that mimics fma_poly_scale_slice_hexl) in hexl-rs display same behaviour. https://github.com/Janmajayamall/hexl-rs/issues/1 tracks this.
Note that this issue renders benchamarks of optimised_range_fn_fma_hexl
useless since all optimised_range_fn_fma_hexl
does is to call fma_poly_scale_slice_hexl
127*2 times and mul_poly_scalar_slice_hexl
2 times.
fma_poly_scale_slice_hexl
performance times does not increase linearly with mod_size (ie moduli count in poly). Instead as mod_size increases time blows up.Poly stores its coefficients in row major form and all fma_poly_scale_slice_hexl does is that it calls
hexl_rs::elwise_fma_mod
for each row in poly. This meansfma_poly_scale_slice_hexl
callselwise_fma_mod
mod_size times (row count equals mod_size, since there is a single row for each moduli). Hence, time taken when mod_size = 15 must be 15x of time taken when mod_size = 1. But this isn't the case.For example on
r6i.8xlarge
instance, following are benchmarks forfma_poly_scale_slice_hexl
:Time taken clearly does not increase linearly with mod_size.