I have tweaked the core logic of the M3 code to utilize openmp simd reductions inside of the openmp loops. Ad-hoc testing shows that this gives a good speed up for column renormalization, computing matrix elements and matvec as a whole. Moving from Cython to this c-code seems like an easy win. The downside is that openmp must be v4 or higher, and I am not sure how much windows support there is for that.
Part of this will include going from double -> float data for the calibrations as the former has too much precision here.
I have tweaked the core logic of the M3 code to utilize openmp simd reductions inside of the openmp loops. Ad-hoc testing shows that this gives a good speed up for column renormalization, computing matrix elements and matvec as a whole. Moving from Cython to this c-code seems like an easy win. The downside is that openmp must be v4 or higher, and I am not sure how much windows support there is for that.
Part of this will include going from double -> float data for the calibrations as the former has too much precision here.