Open albinahlback opened 9 months ago
Currently on my arm_assembly
branch:
mpn_mul vs flint_mpn_mul
m = 1: 4.67
m = 2: 4.68 3.61
m = 3: 4.01 3.30 3.04
m = 4: 2.89 2.39 2.27 2.18
m = 5: 3.03 2.21 1.95 2.02 2.04
m = 6: 2.64 1.97 1.82 1.89 2.18 2.05
m = 7: 2.32 1.79 1.99 1.68 1.76 1.79 1.83
m = 8: 2.13 1.69 1.61 1.59 1.70 1.74 1.81 1.79
m = 9: 1.96 1.63 1.57 1.53 1.63 1.64 1.64 1.71 1.77
m = 10: 1.81 1.49 1.48 1.47 1.51 1.63 1.60 1.69 1.73 1.75
m = 11: 1.75 1.50 1.45 1.46 1.45 1.48 1.51 1.53 1.57 1.56 1.58
m = 12: 1.63 1.37 1.39 1.47 1.51 1.57 1.69 1.78 1.67 1.58 1.58 1.61
Tested on cfarm103 (Apple M1)
I'm sure it has gotten the attention of everyone that Apple's M-chips are basically as fast as the state-of-the-art x86 processors (see GMP's benchmark results). Therefore, I think we should implement assembly routines for these ones as well.
These are the current routines that should be implemented:
mpn_mul_basecase
)Useful links: