flintlib / flint

FLINT (Fast Library for Number Theory)
http://www.flintlib.org
GNU Lesser General Public License v3.0
401 stars 235 forks source link

Complex nfloats #2004

Closed fredrik-johansson closed 1 month ago

fredrik-johansson commented 1 month ago

Initial code for nfloat_complex.

Also adds nfloat_sqr (todo: use the new sqrhigh_normalised) and doubles the allowed nfloat precision to 4224 bits.

fredrik-johansson commented 1 month ago

Profile vs acf. Note that there is no dot product code yet.

$ build/nfloat/profile/p-vs_acf 
                   _gr_vec_add          _gr_vec_mul       _gr_vec_mul_scalar  _gr_vec_addmul_scalar  _gr_vec_sum          _gr_vec_product      _gr_vec_dot
prec = 64
n =   10        6.420e-08 (3.941x)   1.470e-07 (3.544x)   8.000e-08 (3.888x)   1.770e-07 (3.475x)   9.570e-08 (2.497x)   1.330e-07 (3.541x)   2.270e-07 (0.877x)
n =  100        5.930e-07 (4.452x)   1.490e-06 (3.550x)   7.820e-07 (4.066x)   1.810e-06 (3.508x)   1.060e-06 (2.509x)   1.520e-06 (3.474x)   2.280e-06 (0.658x)
prec = 128
n =   10        8.910e-08 (4.804x)   2.220e-07 (3.230x)   1.010e-07 (3.396x)   2.140e-07 (3.631x)   1.070e-07 (3.654x)   1.900e-07 (3.505x)   3.150e-07 (0.968x)
n =  100        8.190e-07 (5.165x)   2.150e-06 (3.344x)   1.044e-06 (3.285x)   2.220e-06 (3.743x)   1.190e-06 (3.605x)   2.190e-06 (3.333x)   3.120e-06 (0.744x)
prec = 192
n =   10        1.090e-07 (4.651x)   4.290e-07 (1.886x)   1.350e-07 (2.889x)   2.550e-07 (3.580x)   1.260e-07 (3.587x)   3.970e-07 (1.824x)   5.740e-07 (1.263x)
n =  100        1.030e-06 (4.864x)   4.170e-06 (1.916x)   1.320e-06 (2.955x)   2.650e-06 (3.668x)   1.310e-06 (3.878x)   4.300e-06 (1.888x)   5.540e-06 (1.339x)
prec = 256
n =   10        1.280e-07 (3.906x)   4.880e-07 (1.887x)   1.730e-07 (2.468x)   3.070e-07 (3.140x)   1.430e-07 (3.217x)   4.740e-07 (1.705x)   6.720e-07 (1.231x)
n =  100        1.210e-06 (4.240x)   5.080e-06 (1.734x)   1.720e-06 (2.459x)   3.490e-06 (2.980x)   1.460e-06 (3.568x)   5.080e-06 (1.787x)   6.730e-06 (1.230x)
prec = 512
n =   10        2.800e-07 (1.921x)   8.940e-07 (1.812x)   3.420e-07 (2.164x)   5.690e-07 (2.302x)   2.420e-07 (2.017x)   8.100e-07 (1.802x)   1.140e-06 (1.342x)
n =  100        2.800e-06 (1.975x)   8.970e-06 (1.750x)   3.520e-06 (2.088x)   6.070e-06 (2.257x)   2.310e-06 (2.463x)   8.900e-06 (1.787x)   1.150e-05 (1.313x)
prec = 1024
n =   10        3.610e-07 (1.762x)   3.010e-06 (1.538x)   1.530e-06 (1.425x)   1.840e-06 (1.565x)   3.050e-07 (1.885x)   2.680e-06 (1.545x)   3.280e-06 (1.390x)
n =  100        3.630e-06 (1.815x)   3.090e-05 (1.460x)   1.560e-05 (1.429x)   1.860e-05 (1.575x)   3.220e-06 (2.084x)   3.050e-05 (1.485x)   3.410e-05 (1.328x)
prec = 2048
n =   10        4.950e-07 (1.608x)   8.320e-06 (1.935x)   4.780e-06 (1.529x)   5.230e-06 (1.518x)   4.160e-07 (1.721x)   7.470e-06 (1.941x)   8.750e-06 (1.451x)
n =  100        4.980e-06 (1.681x)   8.400e-05 (1.833x)   4.780e-05 (1.448x)   5.220e-05 (1.502x)   4.410e-06 (1.878x)   8.290e-05 (1.918x)   8.830e-05 (1.325x)
prec = 4096
n =   10        7.930e-07 (1.589x)   2.840e-05 (1.430x)   1.640e-05 (1.329x)   1.720e-05 (1.343x)   6.720e-07 (1.562x)   2.540e-05 (1.445x)   2.910e-05 (1.395x)
n =  100        8.700e-06 (1.517x)   2.860e-04 (1.381x)   1.670e-04 (1.281x)   1.740e-04 (1.305x)   7.310e-06 (1.683x)   2.820e-04 (1.418x)   2.940e-04 (1.286x)

Multiplications are quite fast at high precision thanks to combining Karatsuba and mulhigh!

albinahlback commented 1 month ago

Profile vs acf. Note that there is no dot product code yet.

$ build/nfloat/profile/p-vs_acf 
                   _gr_vec_add          _gr_vec_mul       _gr_vec_mul_scalar  _gr_vec_addmul_scalar  _gr_vec_sum          _gr_vec_product      _gr_vec_dot
prec = 64
n =   10        6.420e-08 (3.941x)   1.470e-07 (3.544x)   8.000e-08 (3.888x)   1.770e-07 (3.475x)   9.570e-08 (2.497x)   1.330e-07 (3.541x)   2.270e-07 (0.877x)
n =  100        5.930e-07 (4.452x)   1.490e-06 (3.550x)   7.820e-07 (4.066x)   1.810e-06 (3.508x)   1.060e-06 (2.509x)   1.520e-06 (3.474x)   2.280e-06 (0.658x)
prec = 128
n =   10        8.910e-08 (4.804x)   2.220e-07 (3.230x)   1.010e-07 (3.396x)   2.140e-07 (3.631x)   1.070e-07 (3.654x)   1.900e-07 (3.505x)   3.150e-07 (0.968x)
n =  100        8.190e-07 (5.165x)   2.150e-06 (3.344x)   1.044e-06 (3.285x)   2.220e-06 (3.743x)   1.190e-06 (3.605x)   2.190e-06 (3.333x)   3.120e-06 (0.744x)
prec = 192
n =   10        1.090e-07 (4.651x)   4.290e-07 (1.886x)   1.350e-07 (2.889x)   2.550e-07 (3.580x)   1.260e-07 (3.587x)   3.970e-07 (1.824x)   5.740e-07 (1.263x)
n =  100        1.030e-06 (4.864x)   4.170e-06 (1.916x)   1.320e-06 (2.955x)   2.650e-06 (3.668x)   1.310e-06 (3.878x)   4.300e-06 (1.888x)   5.540e-06 (1.339x)
prec = 256
n =   10        1.280e-07 (3.906x)   4.880e-07 (1.887x)   1.730e-07 (2.468x)   3.070e-07 (3.140x)   1.430e-07 (3.217x)   4.740e-07 (1.705x)   6.720e-07 (1.231x)
n =  100        1.210e-06 (4.240x)   5.080e-06 (1.734x)   1.720e-06 (2.459x)   3.490e-06 (2.980x)   1.460e-06 (3.568x)   5.080e-06 (1.787x)   6.730e-06 (1.230x)
prec = 512
n =   10        2.800e-07 (1.921x)   8.940e-07 (1.812x)   3.420e-07 (2.164x)   5.690e-07 (2.302x)   2.420e-07 (2.017x)   8.100e-07 (1.802x)   1.140e-06 (1.342x)
n =  100        2.800e-06 (1.975x)   8.970e-06 (1.750x)   3.520e-06 (2.088x)   6.070e-06 (2.257x)   2.310e-06 (2.463x)   8.900e-06 (1.787x)   1.150e-05 (1.313x)
prec = 1024
n =   10        3.610e-07 (1.762x)   3.010e-06 (1.538x)   1.530e-06 (1.425x)   1.840e-06 (1.565x)   3.050e-07 (1.885x)   2.680e-06 (1.545x)   3.280e-06 (1.390x)
n =  100        3.630e-06 (1.815x)   3.090e-05 (1.460x)   1.560e-05 (1.429x)   1.860e-05 (1.575x)   3.220e-06 (2.084x)   3.050e-05 (1.485x)   3.410e-05 (1.328x)
prec = 2048
n =   10        4.950e-07 (1.608x)   8.320e-06 (1.935x)   4.780e-06 (1.529x)   5.230e-06 (1.518x)   4.160e-07 (1.721x)   7.470e-06 (1.941x)   8.750e-06 (1.451x)
n =  100        4.980e-06 (1.681x)   8.400e-05 (1.833x)   4.780e-05 (1.448x)   5.220e-05 (1.502x)   4.410e-06 (1.878x)   8.290e-05 (1.918x)   8.830e-05 (1.325x)
prec = 4096
n =   10        7.930e-07 (1.589x)   2.840e-05 (1.430x)   1.640e-05 (1.329x)   1.720e-05 (1.343x)   6.720e-07 (1.562x)   2.540e-05 (1.445x)   2.910e-05 (1.395x)
n =  100        8.700e-06 (1.517x)   2.860e-04 (1.381x)   1.670e-04 (1.281x)   1.740e-04 (1.305x)   7.310e-06 (1.683x)   2.820e-04 (1.418x)   2.940e-04 (1.286x)

Multiplications are quite fast at high precision thanks to combining Karatsuba and mulhigh!

Why is the speedup greater for _gr_vec_mul with 2048 bits of precision than 1024 bits of precision?

fredrik-johansson commented 1 month ago

Why is the speedup greater for _gr_vec_mul with 2048 bits of precision than 1024 bits of precision?

Not sure, but there's no reason to expect the ratio to vary monotonically with the precision. There might be more of a overhead advantage at lower precision and more of an algorithm advantage at higher precision, with a minimum improvement where they meet, for example.