JeffersonLab / qphix

QCD for Intel Xeon Phi and Xeon processors
http://jeffersonlab.github.io/qphix/
Other
13 stars 11 forks source link

FLOP counts are scattered through the codebase #65

Open martin-ueding opened 7 years ago

martin-ueding commented 7 years ago

In the clover test, there is the following line after the CG:

      unsigned long total_flops =
          (site_flops + (1320 + 504 + 1320 + 504 + 48) * mv_apps) * num_cb_sites;

The number of operations should be known to the clover operator. I propose to add some static constexpr int flops to the various operators and a similar dslash_flops and achimbdpsi_flops to the Dslash classes. Then computation of flops would not involve those magic numbers.

Alternatively one could do a type-traits construct if one does not want to add this to the classes; but I don't see any problem of doing so.

martin-ueding commented 7 years ago

For the operators adding some virtual int get_flops() const member function is probably the easiest way, then polymorphism also works. The Dslash should have get_dslash_flops and get_achimbdpsi_flops functions, presumably. But what about the BLAS functions? They need some sort of annotation mechanism. In Python one could do the following with no changes to the existing code:

# An example BLAS function.
def copySpinor(dest, src):
    # Some implementation.
    pass

# In Python, all functions are functor instancess, so we can just add a member
# variable to them.
copySpinor.flops = 0

def conjugate_gradient(source):
    # ...
    copySpinor(dest, src)

    flops += copySpinor.flops
    # ...

I do not know how functions can be annotated in C++, so I asked a question on StackOverflow. Perhaps one has to resort converting the functions to instances of singleton functors and then have the additional methods.

Another idea would be to add an argument int &flops to every BLAS function such that it can return the number of flops done. But that seems to be a violation of the open closed principle (OCP) and perhaps the single responsibility principle (SRP).

Do you have an idea for a nice way to annotate the BLAS operations with their useful flops?