JeffersonLab / qphix

QCD for Intel Xeon Phi and Xeon processors
http://jeffersonlab.github.io/qphix/
Other
13 stars 11 forks source link

correct two-flavour EvenOddLinearOperator for NDTM #77

Closed kostrzewa closed 6 years ago

kostrzewa commented 7 years ago

(a+imugamma_5)

kostrzewa commented 7 years ago

This is completely untested, could we test it together?

martin-ueding commented 7 years ago

I have written a test, one can run it like so:

mpirun -n 1 tests-gtest/tests_g -by 4 -bz 4 -pxy 1 -pxyz 0 -minct 1 -c 2 -sy 1 -sz 2 -x 16 -y 16 -z 16 -t 16 --gtest_filter=apimugamma5.\*

In double precision, using the function to multiply once and multiply with the inverse agrees to 1.10638302670353e-19 using a Gaussian input spinor. I'd say this works.

kostrzewa commented 7 years ago

Excellent, thanks for the test! I'll try to figure out why the travis build is failing, but maybe it was just because I pushed commits.

martin-ueding commented 7 years ago

It build used to fail to compile because of some minor typos, I have fixed those. Then the build seems to fail because of something in the header files. I am not really sure why this happens, perhaps one is missing some include. Or some intrinsic code is not properly guarded for the particular architecture.

martin-ueding commented 7 years ago

I have fixed all I have just merged the latest changes and fixed all compilation issues. Pending are test cases.

martin-ueding commented 7 years ago

In the two-flavor BLAS function, there is a need for a const in order to get the T ** to T const *const * conversion correct. And it also needs the Intel Compiler Workaround as well, so some more SFINAE. I'll add that.

martin-ueding commented 7 years ago

The use of FT looks dubious to me in non-intrinsic code. In half precision, FT = unsigned short and therefore the line res_up[0][s] = alpha[0] * x_up[0][s] - alpha[1] * x_up[1][s]; will not do what we expect. It will interpret the bits in all variables all unsigned short and do an integer multiplication. So we need to use the ArithType instead, I believe.

kostrzewa commented 7 years ago

The use of FT looks dubious to me in non-intrinsic code. In half precision, FT = unsigned short and therefore the line res_up[0][s] = alpha[0] x_up[0][s] - alpha[1] x_up[1][s]; will not do what we expect. It will interpret the bits in all variables all unsigned short and do an integer multiplication. So we need to use the ArithType instead, I believe.

Good catch, I had set this up already, but I forgot to do it correctly...

martin-ueding commented 7 years ago

There was a tiny typo in the new routine which I just fixed, perhaps the test will succeed now. For epsilon = 1 it worked before, so I will keep you posted.

kostrzewa commented 7 years ago

There was a tiny typo in the new routine which I just fixed, perhaps the test will succeed now ...

All the BLASUtils::func calls should explicitly use the <AT, S> specialisation

BLASUtils::cm -> BLASUtils::cm<AT, S>
kostrzewa commented 7 years ago

I would have preferred to only introduce the StreamSpinor class after this works...

kostrzewa commented 7 years ago

apimu and epsilon need to be repped to AT rather than FT

martin-ueding commented 7 years ago

StreamSpinor changes are reverted in the two-flavor case. The one-flavor case is already tested and still works, so I don't think that this is an issue.

martin-ueding commented 7 years ago

Damn, I have changed apimi and epsilon to AT in the member variables, but I missed in in the initializer in the ctor. I have changed that, tests are compiling ...

martin-ueding commented 7 years ago

No, that did not solve the issue. I am testing with double on my machine, so everything is double anyway. It would have broken with a different setup, but that was not the final bug, yet.

kostrzewa commented 7 years ago

I'm calling it a day for now, see you tomorrow!

kostrzewa commented 7 years ago

Ugh, we have a really annoying problem to fix or otherwise resolve. In two_flav_AChiMinusBDPsi, it is nice to re-use the single-flavour tmdslashAChiMinusBDPsi, but we need to change the sign of mu for the second flavour, which is currently impossible, of course...

kostrzewa commented 7 years ago

My first instinct would be to change the signature of

dslashAChiMinusBDPsi
TMDyzAChiMinusBDPsi

to add a default argument which we can use to change the sign of derived_mu in the call of the kernel.

kostrzewa commented 7 years ago

@martin-ueding if you have some time, I could use your help with one issue. When I try to instantiate the two flavour NDTM linear operator, gcc complains that operator() of the base class is pure virtual. As far as I can tell, however, it is properly overridden...

kostrzewa commented 7 years ago

Okay, I seem to have resolved that one, I had just forgotten to reinstall my changes..

kostrzewa commented 7 years ago

Unfortunately, the problem has now switched to the num_flavour solvers, which suffer from const'ness issues through the num_flavour operator() of ndtm_reuse_operator ...

martin-ueding commented 7 years ago

To make it easier, I'd add override to the functions that should override. Then the compiler will complain if some const is missing. I just pulled in your changes on the tm_functor branch, QPhiX itself compiles cleanly. That problem occurs with tmLQCD?

Perhaps this is relevant as well, taken from the solvers:

  // This class overrides the `operator()` from `AbstractSolver`. Due to “name
  // hiding”, the overloads of `operator()` in the base class are no longer
  // visible in this class. Therefore the single-flavor interface is not found
  // when trying to use the solver like it has worked before, namely with an
  // instance of this solver with automatic storage (i.e. no pointers). Here
  // we do want the overload for a single spinor pointer to delegate back to
  // the multi-flavor variant. The overloads need to be included explicitly
  // here. See http://stackoverflow.com/a/42588534/653152 for the full answer.
  using AbstractSolver<FT, veclen, soalen, compress12, num_flav>::operator();
kostrzewa commented 7 years ago

Okay, so I removed the const'ness of the two-flavour operator() again, as you had originally, but when invcg() calls this, it sill attempts to pass a const first argument...

/home/bartek/local/qphix.devel/avx2/include/qphix/invcg.h: In instantiation of ‘void QPhiX::InvCG<FT, veclen, soalen, compress12, EvenOddLinearOperatorBase>::operator()(QPhiX::InvCG<FT, veclen, soalen, compress12, EvenOddLinearOperatorBase>::Spinor**, const Spinor* const*, double, int&, double&, long unsigned int&, long unsigned int&, int, bool, int) const [with FT = double; int veclen = 4; int soalen = 4; bool compress12 = false; EvenOddLinearOperatorBase = QPhiX::TwoFlavEvenOddLinearOperator<double, 4, 4, false>; QPhiX::InvCG<FT, veclen, soalen, compress12, EvenOddLinearOperatorBase>::Spinor = double [3][4][2][4]]’:
/home/bartek/code/tmLQCD.qphix2/qphix_interface.cpp:1665:1:   required from here
/home/bartek/local/qphix.devel/avx2/include/qphix/invcg.h:187:6: error: no match for call to ‘(QPhiX::TwoFlavEvenOddLinearOperator<double, 4, 4, false>) (double (* const [2])[3][4][2][4], double (**&)[3][4][2][4], int&, int&)’
     M(mp, x, isign, cb);
      ^
In file included from /home/bartek/local/qphix.devel/avx2/include/qphix/clover.h:3:0,
                 from /home/bartek/code/tmLQCD.qphix2/qphix_interface.cpp:60:
/home/bartek/local/qphix.devel/avx2/include/qphix/linearOp.h:62:16: note: candidate: void QPhiX::TwoFlavEvenOddLinearOperator<FT, veclen, soalen, compress>::operator()(QPhiX::TwoFlavEvenOddLinearOperator<FT, veclen, soalen, compress>::FourSpinorBlock**, const FourSpinorBlock* const*, int, int) [with FT = double; int veclen = 4; int soalen = 4; bool compress = false; QPhiX::TwoFlavEvenOddLinearOperator<FT, veclen, soalen, compress>::FourSpinorBlock = double [3][4][2][4]] <near match>
   virtual void operator()(FourSpinorBlock *res[2],
                ^
/home/bartek/local/qphix.devel/avx2/include/qphix/linearOp.h:62:16: note:   conversion of argument 1 would be ill-formed:
In file included from /home/bartek/code/tmLQCD.qphix2/qphix_interface.cpp:62:0:
/home/bartek/local/qphix.devel/avx2/include/qphix/invcg.h:187:6: error: invalid conversion from ‘double (* const*)[3][4][2][4]’ to ‘double (**)[3][4][2][4]’ [-fpermissive]
     M(mp, x, isign, cb);
      ^
/home/bartek/local/qphix.devel/avx2/include/qphix/invcg.h:195:6: error: no match for call to ‘(QPhiX::TwoFlavEvenOddLinearOperator<double, 4, 4, false>) (double (* const [2])[3][4][2][4], double (* const [2])[3][4][2][4], int, int&)’
     M(mmp, mp, -isign, cb);
kostrzewa commented 7 years ago

Oh wouldn't it be nice if gcc put some newlines in there..

martin-ueding commented 7 years ago

I think you need to change FourSpinorBlock *res[2] to FourSpinorBlock *const res[2]. Then the conversion should be fine.

One could keep the member functions const by making the temporary fields mutable. C++11 const means “thread safe”. Since we do not really have concurrency, we would not really need to do a synchronization. But then it is more honest to just keep the members non-const.

kostrzewa commented 7 years ago

I think you need to change FourSpinorBlock res[2] to FourSpinorBlock const res[2]. Then the conversion should be fine.

Yes, you're right. I thought about making the temporaries mutable. I'll try to get it to run first and then I'll see if we can do something. These functions are almost certainly not thread-safe anyway, on account of what happens deep within the kernels.

kostrzewa commented 7 years ago

Okay, I got it to compile :D

kostrzewa commented 7 years ago

Yay, it works now.

kostrzewa commented 7 years ago

@martin-ueding If you have a moment, could you advise me whether I can solve this in a nicer way? https://github.com/JeffersonLab/qphix/pull/77/commits/b16daad2b3dfadec2c9bb99a46c33e95e3e1a9f3 This was the only way I found to get the code to compile on Marconi using ICC...

martin-ueding commented 7 years ago

@kostrzewa, I think you can use the typename Spinor2, typename Spinor2 with enable_if and is_same like in the BLAS routines to delegate the const-issue to the template system. Especially since you have multiple parameters that may be const or not.

kostrzewa commented 7 years ago

@martin-ueding but these are constructors and member functions respectively, can the type system deduce the class type from the arguments of its member functions?

kostrzewa commented 7 years ago

well, for the functor it will probably work, since it is instantiated in a free function

kostrzewa commented 7 years ago

In general it seems that this will only work for members since C++14, which is nice but we clearly can't depend on.... (yet)

martin-ueding commented 7 years ago

Right, template arguments can only be deduced for free functions, that is why there is a make_pair and so on. I vaguely recall something about this being fixed in C++17.

Perhaps this could be solved with a free factory function like this:

template <typename Spinor1>
std::enable_if<
    std::is_same<Spinor1 const,
                 typename Geometry<...>::FourSpinorBlock const>::value,
    void>::type
make_foo(Spinor1 *const *arg) {
    Foo foo(const_cast<typename Geometry<...>::FourSpinorBlock const *const *>(
        arg));
    return foo;
}

And another free function to call the member:

template <typename Spinor1>
std::enable_if<
    std::is_same<Spinor1 const,
                 typename Geometry<...>::FourSpinorBlock const>::value,
    Foo>::type
call_member(Class &object, Spinor1 *const *arg) {
    object.member(
        const_cast<typename Geometry<...>::FourSpinorBlock const *const *>(
            arg));
}