Closed kostrzewa closed 6 years ago
This is completely untested, could we test it together?
I have written a test, one can run it like so:
mpirun -n 1 tests-gtest/tests_g -by 4 -bz 4 -pxy 1 -pxyz 0 -minct 1 -c 2 -sy 1 -sz 2 -x 16 -y 16 -z 16 -t 16 --gtest_filter=apimugamma5.\*
In double precision, using the function to multiply once and multiply with the inverse agrees to 1.10638302670353e-19 using a Gaussian input spinor. I'd say this works.
Excellent, thanks for the test! I'll try to figure out why the travis build is failing, but maybe it was just because I pushed commits.
It build used to fail to compile because of some minor typos, I have fixed those. Then the build seems to fail because of something in the header files. I am not really sure why this happens, perhaps one is missing some include. Or some intrinsic code is not properly guarded for the particular architecture.
I have fixed all I have just merged the latest changes and fixed all compilation issues. Pending are test cases.
In the two-flavor BLAS function, there is a need for a const
in order to get the T **
to T const *const *
conversion correct. And it also needs the Intel Compiler Workaround as well, so some more SFINAE. I'll add that.
The use of FT
looks dubious to me in non-intrinsic code. In half precision, FT = unsigned short
and therefore the line res_up[0][s] = alpha[0] * x_up[0][s] - alpha[1] * x_up[1][s];
will not do what we expect. It will interpret the bits in all variables all unsigned short
and do an integer multiplication. So we need to use the ArithType
instead, I believe.
The use of FT looks dubious to me in non-intrinsic code. In half precision, FT = unsigned short and therefore the line res_up[0][s] = alpha[0] x_up[0][s] - alpha[1] x_up[1][s]; will not do what we expect. It will interpret the bits in all variables all unsigned short and do an integer multiplication. So we need to use the ArithType instead, I believe.
Good catch, I had set this up already, but I forgot to do it correctly...
There was a tiny typo in the new routine which I just fixed, perhaps the test will succeed now. For epsilon = 1
it worked before, so I will keep you posted.
There was a tiny typo in the new routine which I just fixed, perhaps the test will succeed now ...
All the BLASUtils::func calls should explicitly use the <AT, S> specialisation
BLASUtils::cm -> BLASUtils::cm<AT, S>
I would have preferred to only introduce the StreamSpinor class after this works...
apimu
and epsilon
need to be repped to AT
rather than FT
StreamSpinor
changes are reverted in the two-flavor case. The one-flavor case is already tested and still works, so I don't think that this is an issue.
Damn, I have changed apimi
and epsilon
to AT
in the member variables, but I missed in in the initializer in the ctor. I have changed that, tests are compiling ...
No, that did not solve the issue. I am testing with double
on my machine, so everything is double
anyway. It would have broken with a different setup, but that was not the final bug, yet.
I'm calling it a day for now, see you tomorrow!
Ugh, we have a really annoying problem to fix or otherwise resolve. In two_flav_AChiMinusBDPsi
, it is nice to re-use the single-flavour tmdslashAChiMinusBDPsi
, but we need to change the sign of mu for the second flavour, which is currently impossible, of course...
My first instinct would be to change the signature of
dslashAChiMinusBDPsi
TMDyzAChiMinusBDPsi
to add a default argument which we can use to change the sign of derived_mu in the call of the kernel.
@martin-ueding if you have some time, I could use your help with one issue. When I try to instantiate the two flavour NDTM linear operator, gcc complains that operator()
of the base class is pure virtual. As far as I can tell, however, it is properly overridden...
Okay, I seem to have resolved that one, I had just forgotten to reinstall my changes..
Unfortunately, the problem has now switched to the num_flavour solvers, which suffer from const'ness issues through the num_flavour operator()
of ndtm_reuse_operator ...
To make it easier, I'd add override
to the functions that should override. Then the compiler will complain if some const
is missing. I just pulled in your changes on the tm_functor
branch, QPhiX itself compiles cleanly. That problem occurs with tmLQCD?
Perhaps this is relevant as well, taken from the solvers:
// This class overrides the `operator()` from `AbstractSolver`. Due to “name
// hiding”, the overloads of `operator()` in the base class are no longer
// visible in this class. Therefore the single-flavor interface is not found
// when trying to use the solver like it has worked before, namely with an
// instance of this solver with automatic storage (i.e. no pointers). Here
// we do want the overload for a single spinor pointer to delegate back to
// the multi-flavor variant. The overloads need to be included explicitly
// here. See http://stackoverflow.com/a/42588534/653152 for the full answer.
using AbstractSolver<FT, veclen, soalen, compress12, num_flav>::operator();
Okay, so I removed the const'ness of the two-flavour operator() again, as you had originally, but when invcg() calls this, it sill attempts to pass a const first argument...
/home/bartek/local/qphix.devel/avx2/include/qphix/invcg.h: In instantiation of ‘void QPhiX::InvCG<FT, veclen, soalen, compress12, EvenOddLinearOperatorBase>::operator()(QPhiX::InvCG<FT, veclen, soalen, compress12, EvenOddLinearOperatorBase>::Spinor**, const Spinor* const*, double, int&, double&, long unsigned int&, long unsigned int&, int, bool, int) const [with FT = double; int veclen = 4; int soalen = 4; bool compress12 = false; EvenOddLinearOperatorBase = QPhiX::TwoFlavEvenOddLinearOperator<double, 4, 4, false>; QPhiX::InvCG<FT, veclen, soalen, compress12, EvenOddLinearOperatorBase>::Spinor = double [3][4][2][4]]’:
/home/bartek/code/tmLQCD.qphix2/qphix_interface.cpp:1665:1: required from here
/home/bartek/local/qphix.devel/avx2/include/qphix/invcg.h:187:6: error: no match for call to ‘(QPhiX::TwoFlavEvenOddLinearOperator<double, 4, 4, false>) (double (* const [2])[3][4][2][4], double (**&)[3][4][2][4], int&, int&)’
M(mp, x, isign, cb);
^
In file included from /home/bartek/local/qphix.devel/avx2/include/qphix/clover.h:3:0,
from /home/bartek/code/tmLQCD.qphix2/qphix_interface.cpp:60:
/home/bartek/local/qphix.devel/avx2/include/qphix/linearOp.h:62:16: note: candidate: void QPhiX::TwoFlavEvenOddLinearOperator<FT, veclen, soalen, compress>::operator()(QPhiX::TwoFlavEvenOddLinearOperator<FT, veclen, soalen, compress>::FourSpinorBlock**, const FourSpinorBlock* const*, int, int) [with FT = double; int veclen = 4; int soalen = 4; bool compress = false; QPhiX::TwoFlavEvenOddLinearOperator<FT, veclen, soalen, compress>::FourSpinorBlock = double [3][4][2][4]] <near match>
virtual void operator()(FourSpinorBlock *res[2],
^
/home/bartek/local/qphix.devel/avx2/include/qphix/linearOp.h:62:16: note: conversion of argument 1 would be ill-formed:
In file included from /home/bartek/code/tmLQCD.qphix2/qphix_interface.cpp:62:0:
/home/bartek/local/qphix.devel/avx2/include/qphix/invcg.h:187:6: error: invalid conversion from ‘double (* const*)[3][4][2][4]’ to ‘double (**)[3][4][2][4]’ [-fpermissive]
M(mp, x, isign, cb);
^
/home/bartek/local/qphix.devel/avx2/include/qphix/invcg.h:195:6: error: no match for call to ‘(QPhiX::TwoFlavEvenOddLinearOperator<double, 4, 4, false>) (double (* const [2])[3][4][2][4], double (* const [2])[3][4][2][4], int, int&)’
M(mmp, mp, -isign, cb);
Oh wouldn't it be nice if gcc put some newlines in there..
I think you need to change FourSpinorBlock *res[2]
to FourSpinorBlock *const res[2]
. Then the conversion should be fine.
One could keep the member functions const
by making the temporary fields mutable
. C++11 const
means “thread safe”. Since we do not really have concurrency, we would not really need to do a synchronization. But then it is more honest to just keep the members non-const
.
I think you need to change FourSpinorBlock res[2] to FourSpinorBlock const res[2]. Then the conversion should be fine.
Yes, you're right. I thought about making the temporaries mutable. I'll try to get it to run first and then I'll see if we can do something. These functions are almost certainly not thread-safe anyway, on account of what happens deep within the kernels.
Okay, I got it to compile :D
Yay, it works now.
@martin-ueding If you have a moment, could you advise me whether I can solve this in a nicer way? https://github.com/JeffersonLab/qphix/pull/77/commits/b16daad2b3dfadec2c9bb99a46c33e95e3e1a9f3 This was the only way I found to get the code to compile on Marconi using ICC...
@kostrzewa, I think you can use the typename Spinor2, typename Spinor2
with enable_if
and is_same
like in the BLAS routines to delegate the const-issue to the template system. Especially since you have multiple parameters that may be const
or not.
@martin-ueding but these are constructors and member functions respectively, can the type system deduce the class type from the arguments of its member functions?
well, for the functor it will probably work, since it is instantiated in a free function
In general it seems that this will only work for members since C++14, which is nice but we clearly can't depend on.... (yet)
Right, template arguments can only be deduced for free functions, that is why there is a make_pair
and so on. I vaguely recall something about this being fixed in C++17.
Perhaps this could be solved with a free factory function like this:
template <typename Spinor1>
std::enable_if<
std::is_same<Spinor1 const,
typename Geometry<...>::FourSpinorBlock const>::value,
void>::type
make_foo(Spinor1 *const *arg) {
Foo foo(const_cast<typename Geometry<...>::FourSpinorBlock const *const *>(
arg));
return foo;
}
And another free function to call the member:
template <typename Spinor1>
std::enable_if<
std::is_same<Spinor1 const,
typename Geometry<...>::FourSpinorBlock const>::value,
Foo>::type
call_member(Class &object, Spinor1 *const *arg) {
object.member(
const_cast<typename Geometry<...>::FourSpinorBlock const *const *>(
arg));
}
(a+imugamma_5)