Closed fredrik-johansson closed 1 month ago
Profile vs acf
. Note that there is no dot product code yet.
$ build/nfloat/profile/p-vs_acf
_gr_vec_add _gr_vec_mul _gr_vec_mul_scalar _gr_vec_addmul_scalar _gr_vec_sum _gr_vec_product _gr_vec_dot
prec = 64
n = 10 6.420e-08 (3.941x) 1.470e-07 (3.544x) 8.000e-08 (3.888x) 1.770e-07 (3.475x) 9.570e-08 (2.497x) 1.330e-07 (3.541x) 2.270e-07 (0.877x)
n = 100 5.930e-07 (4.452x) 1.490e-06 (3.550x) 7.820e-07 (4.066x) 1.810e-06 (3.508x) 1.060e-06 (2.509x) 1.520e-06 (3.474x) 2.280e-06 (0.658x)
prec = 128
n = 10 8.910e-08 (4.804x) 2.220e-07 (3.230x) 1.010e-07 (3.396x) 2.140e-07 (3.631x) 1.070e-07 (3.654x) 1.900e-07 (3.505x) 3.150e-07 (0.968x)
n = 100 8.190e-07 (5.165x) 2.150e-06 (3.344x) 1.044e-06 (3.285x) 2.220e-06 (3.743x) 1.190e-06 (3.605x) 2.190e-06 (3.333x) 3.120e-06 (0.744x)
prec = 192
n = 10 1.090e-07 (4.651x) 4.290e-07 (1.886x) 1.350e-07 (2.889x) 2.550e-07 (3.580x) 1.260e-07 (3.587x) 3.970e-07 (1.824x) 5.740e-07 (1.263x)
n = 100 1.030e-06 (4.864x) 4.170e-06 (1.916x) 1.320e-06 (2.955x) 2.650e-06 (3.668x) 1.310e-06 (3.878x) 4.300e-06 (1.888x) 5.540e-06 (1.339x)
prec = 256
n = 10 1.280e-07 (3.906x) 4.880e-07 (1.887x) 1.730e-07 (2.468x) 3.070e-07 (3.140x) 1.430e-07 (3.217x) 4.740e-07 (1.705x) 6.720e-07 (1.231x)
n = 100 1.210e-06 (4.240x) 5.080e-06 (1.734x) 1.720e-06 (2.459x) 3.490e-06 (2.980x) 1.460e-06 (3.568x) 5.080e-06 (1.787x) 6.730e-06 (1.230x)
prec = 512
n = 10 2.800e-07 (1.921x) 8.940e-07 (1.812x) 3.420e-07 (2.164x) 5.690e-07 (2.302x) 2.420e-07 (2.017x) 8.100e-07 (1.802x) 1.140e-06 (1.342x)
n = 100 2.800e-06 (1.975x) 8.970e-06 (1.750x) 3.520e-06 (2.088x) 6.070e-06 (2.257x) 2.310e-06 (2.463x) 8.900e-06 (1.787x) 1.150e-05 (1.313x)
prec = 1024
n = 10 3.610e-07 (1.762x) 3.010e-06 (1.538x) 1.530e-06 (1.425x) 1.840e-06 (1.565x) 3.050e-07 (1.885x) 2.680e-06 (1.545x) 3.280e-06 (1.390x)
n = 100 3.630e-06 (1.815x) 3.090e-05 (1.460x) 1.560e-05 (1.429x) 1.860e-05 (1.575x) 3.220e-06 (2.084x) 3.050e-05 (1.485x) 3.410e-05 (1.328x)
prec = 2048
n = 10 4.950e-07 (1.608x) 8.320e-06 (1.935x) 4.780e-06 (1.529x) 5.230e-06 (1.518x) 4.160e-07 (1.721x) 7.470e-06 (1.941x) 8.750e-06 (1.451x)
n = 100 4.980e-06 (1.681x) 8.400e-05 (1.833x) 4.780e-05 (1.448x) 5.220e-05 (1.502x) 4.410e-06 (1.878x) 8.290e-05 (1.918x) 8.830e-05 (1.325x)
prec = 4096
n = 10 7.930e-07 (1.589x) 2.840e-05 (1.430x) 1.640e-05 (1.329x) 1.720e-05 (1.343x) 6.720e-07 (1.562x) 2.540e-05 (1.445x) 2.910e-05 (1.395x)
n = 100 8.700e-06 (1.517x) 2.860e-04 (1.381x) 1.670e-04 (1.281x) 1.740e-04 (1.305x) 7.310e-06 (1.683x) 2.820e-04 (1.418x) 2.940e-04 (1.286x)
Multiplications are quite fast at high precision thanks to combining Karatsuba and mulhigh!
Profile vs
acf
. Note that there is no dot product code yet.$ build/nfloat/profile/p-vs_acf _gr_vec_add _gr_vec_mul _gr_vec_mul_scalar _gr_vec_addmul_scalar _gr_vec_sum _gr_vec_product _gr_vec_dot prec = 64 n = 10 6.420e-08 (3.941x) 1.470e-07 (3.544x) 8.000e-08 (3.888x) 1.770e-07 (3.475x) 9.570e-08 (2.497x) 1.330e-07 (3.541x) 2.270e-07 (0.877x) n = 100 5.930e-07 (4.452x) 1.490e-06 (3.550x) 7.820e-07 (4.066x) 1.810e-06 (3.508x) 1.060e-06 (2.509x) 1.520e-06 (3.474x) 2.280e-06 (0.658x) prec = 128 n = 10 8.910e-08 (4.804x) 2.220e-07 (3.230x) 1.010e-07 (3.396x) 2.140e-07 (3.631x) 1.070e-07 (3.654x) 1.900e-07 (3.505x) 3.150e-07 (0.968x) n = 100 8.190e-07 (5.165x) 2.150e-06 (3.344x) 1.044e-06 (3.285x) 2.220e-06 (3.743x) 1.190e-06 (3.605x) 2.190e-06 (3.333x) 3.120e-06 (0.744x) prec = 192 n = 10 1.090e-07 (4.651x) 4.290e-07 (1.886x) 1.350e-07 (2.889x) 2.550e-07 (3.580x) 1.260e-07 (3.587x) 3.970e-07 (1.824x) 5.740e-07 (1.263x) n = 100 1.030e-06 (4.864x) 4.170e-06 (1.916x) 1.320e-06 (2.955x) 2.650e-06 (3.668x) 1.310e-06 (3.878x) 4.300e-06 (1.888x) 5.540e-06 (1.339x) prec = 256 n = 10 1.280e-07 (3.906x) 4.880e-07 (1.887x) 1.730e-07 (2.468x) 3.070e-07 (3.140x) 1.430e-07 (3.217x) 4.740e-07 (1.705x) 6.720e-07 (1.231x) n = 100 1.210e-06 (4.240x) 5.080e-06 (1.734x) 1.720e-06 (2.459x) 3.490e-06 (2.980x) 1.460e-06 (3.568x) 5.080e-06 (1.787x) 6.730e-06 (1.230x) prec = 512 n = 10 2.800e-07 (1.921x) 8.940e-07 (1.812x) 3.420e-07 (2.164x) 5.690e-07 (2.302x) 2.420e-07 (2.017x) 8.100e-07 (1.802x) 1.140e-06 (1.342x) n = 100 2.800e-06 (1.975x) 8.970e-06 (1.750x) 3.520e-06 (2.088x) 6.070e-06 (2.257x) 2.310e-06 (2.463x) 8.900e-06 (1.787x) 1.150e-05 (1.313x) prec = 1024 n = 10 3.610e-07 (1.762x) 3.010e-06 (1.538x) 1.530e-06 (1.425x) 1.840e-06 (1.565x) 3.050e-07 (1.885x) 2.680e-06 (1.545x) 3.280e-06 (1.390x) n = 100 3.630e-06 (1.815x) 3.090e-05 (1.460x) 1.560e-05 (1.429x) 1.860e-05 (1.575x) 3.220e-06 (2.084x) 3.050e-05 (1.485x) 3.410e-05 (1.328x) prec = 2048 n = 10 4.950e-07 (1.608x) 8.320e-06 (1.935x) 4.780e-06 (1.529x) 5.230e-06 (1.518x) 4.160e-07 (1.721x) 7.470e-06 (1.941x) 8.750e-06 (1.451x) n = 100 4.980e-06 (1.681x) 8.400e-05 (1.833x) 4.780e-05 (1.448x) 5.220e-05 (1.502x) 4.410e-06 (1.878x) 8.290e-05 (1.918x) 8.830e-05 (1.325x) prec = 4096 n = 10 7.930e-07 (1.589x) 2.840e-05 (1.430x) 1.640e-05 (1.329x) 1.720e-05 (1.343x) 6.720e-07 (1.562x) 2.540e-05 (1.445x) 2.910e-05 (1.395x) n = 100 8.700e-06 (1.517x) 2.860e-04 (1.381x) 1.670e-04 (1.281x) 1.740e-04 (1.305x) 7.310e-06 (1.683x) 2.820e-04 (1.418x) 2.940e-04 (1.286x)
Multiplications are quite fast at high precision thanks to combining Karatsuba and mulhigh!
Why is the speedup greater for _gr_vec_mul
with 2048 bits of precision than 1024 bits of precision?
Why is the speedup greater for _gr_vec_mul with 2048 bits of precision than 1024 bits of precision?
Not sure, but there's no reason to expect the ratio to vary monotonically with the precision. There might be more of a overhead advantage at lower precision and more of an algorithm advantage at higher precision, with a minimum improvement where they meet, for example.
Initial code for
nfloat_complex
.Also adds
nfloat_sqr
(todo: use the new sqrhigh_normalised) and doubles the allowed nfloat precision to 4224 bits.