JeffersonLab / qphix

QCD for Intel Xeon Phi and Xeon processors
http://jeffersonlab.github.io/qphix/
Other
13 stars 11 forks source link

Make FullCloverTerm multiplication faster by transposing #84

Open martin-ueding opened 7 years ago

martin-ueding commented 7 years ago

The FullCloverTerm stores the data in a suboptimal ordering. The data structure should be transposed such that the matrix multiplication could be done with adjacent memory accesses.

martin-ueding commented 6 years ago

Perhaps one can also use the twisted preconditioning mass to get the twisted mass easier. Then even for TM Clover one could use the CloverBlock instead of the FullCloverBlock such that this optimization would be superseded.

kostrzewa commented 6 years ago

For the inverse this is still problematic, you have:

(\alpha+\sigma^{\mu\nu} F\mu\nu - i\mu\gamma5) / ( (alpha+\sigma^{\mu\nu} F\mu\nu)^2 + \mu^2)

Now if you store just the denominator, you have two clover-multiplications to apply the inverse, because the numerator is of course also not trivial in spin-colour... I'm afraid we need to keep the FullCloverBlock for the inverse. For the straight clover term, however, it would make more sense to apply the twisted mass separetely and store just the clover term in CloverBlock format.