Closed fredrik-johansson closed 4 weeks ago
sqrhigh_normalised
Performances for vectorized squaring are currently as follows:
$ build/nfloat/profile/p-vs_arf _gr_vec_add _gr_vec_mul _gr_vec_sqr prec = 64 n = 10 3.030e-08 (4.290x) 1.750e-08 (7.086x) 1.130e-08 (10.885x) n = 100 2.680e-07 (4.627x) 1.410e-07 (8.723x) 8.320e-08 (15.024x) prec = 128 n = 10 4.120e-08 (4.806x) 2.530e-08 (5.375x) 1.920e-08 (7.031x) n = 100 3.860e-07 (5.207x) 2.310e-07 (5.931x) 1.620e-07 (8.457x) prec = 192 n = 10 4.970e-08 (4.950x) 5.380e-08 (2.937x) 4.420e-08 (3.371x) n = 100 4.790e-07 (5.052x) 5.190e-07 (3.083x) 4.210e-07 (3.610x) prec = 256 n = 10 6.170e-08 (3.987x) 7.030e-08 (2.518x) 5.490e-08 (2.987x) n = 100 5.630e-07 (4.405x) 6.900e-07 (2.623x) 5.330e-07 (3.189x) prec = 512 n = 10 1.260e-07 (2.087x) 1.580e-07 (1.981x) 1.000e-07 (3.050x) n = 100 1.220e-06 (2.164x) 1.570e-06 (2.051x) 9.840e-07 (3.272x) prec = 1024 n = 10 1.710e-07 (1.819x) 7.230e-07 (1.285x) 4.490e-07 (1.584x) n = 100 1.660e-06 (1.855x) 7.460e-06 (1.323x) 4.910e-06 (1.670x) prec = 2048 n = 10 2.260e-07 (1.739x) 2.320e-06 (1.263x) 1.340e-06 (1.619x) n = 100 2.320e-06 (1.681x) 2.370e-05 (1.342x) 1.350e-05 (1.800x) prec = 4096 n = 10 3.660e-07 (1.533x) 7.650e-06 (1.199x) 4.480e-06 (1.478x) n = 100 3.900e-06 (1.503x) 7.960e-05 (1.256x) 4.500e-05 (1.633x) $ build/nfloat/profile/p-vs_acf _gr_vec_add _gr_vec_mul _gr_vec_sqr prec = 64 n = 10 5.900e-08 (4.085x) 1.380e-07 (3.580x) 9.040e-08 (4.347x) n = 100 5.540e-07 (4.458x) 1.390e-06 (3.561x) 9.010e-07 (4.306x) prec = 128 n = 10 8.240e-08 (5.024x) 2.100e-07 (3.324x) 1.430e-07 (3.294x) n = 100 7.770e-07 (5.225x) 2.010e-06 (3.463x) 1.420e-06 (3.232x) prec = 192 n = 10 1.030e-07 (4.893x) 3.720e-07 (2.129x) 2.180e-07 (2.486x) n = 100 9.750e-07 (5.026x) 3.680e-06 (2.149x) 2.170e-06 (2.502x) prec = 256 n = 10 1.220e-07 (4.082x) 4.390e-07 (2.009x) 2.600e-07 (2.338x) n = 100 1.160e-06 (4.310x) 4.330e-06 (2.007x) 2.570e-06 (2.354x) prec = 512 n = 10 2.650e-07 (2.038x) 8.490e-07 (1.802x) 5.120e-07 (2.129x) n = 100 2.660e-06 (2.056x) 8.530e-06 (1.758x) 5.100e-06 (2.078x) prec = 1024 n = 10 3.450e-07 (1.835x) 2.850e-06 (1.544x) 1.760e-06 (1.580x) n = 100 3.460e-06 (1.855x) 2.920e-05 (1.462x) 1.810e-05 (1.530x) prec = 2048 n = 10 4.570e-07 (1.670x) 8.150e-06 (1.914x) 4.660e-06 (1.820x) n = 100 4.720e-06 (1.676x) 8.220e-05 (1.788x) 4.670e-05 (1.754x) prec = 4096 n = 10 7.550e-07 (1.589x) 2.690e-05 (1.457x) 1.500e-05 (1.707x) n = 100 8.410e-06 (1.463x) 2.730e-04 (1.407x) 1.500e-04 (1.660x)
Nice performance!
OT, but the CI appears to be working now.
Indeed.
sqrhigh_normalised
Performances for vectorized squaring are currently as follows: