Closed kostrzewa closed 6 years ago
I would like to propose that we move the main devel branch to https://github.com/kostrzewa/tmLQCD/tree/qphix_devel such that all pull-requests going forward should go there. I would not want to keep annoying @plabus with this stuff, now that he has other things to worry about ;)
@sunpho84 Would you be willing to help test and develop at this stage? We really need some more hands on this.
Hi! yeah! what should I do?
So, there's actually something that I find rather perplexing, if not surprising. I've set up a test version of tmLQCD+qphix for the JSC people. In the QPHIX section of this document:https://github.com/etmc/tmLQCD/wiki/JSC-test-setup compilation is described in some detail. When I try to compile this test version using AVX2 kernels - which works perfectly on Jureca - on Marconi (A1 or A2), the final residual is wrong for the test inversion which works fine on Jureca. Exact same code, almost exactly the same compiler and a similar job script.
Some output files, a job script and an input file can be found here together with a test configuration: /marconi_work/INF17_lqcd123_0/bartek/tests/invert_test_cA211a.30.32
It would be nice if someone could confirm that there's an issue on Marconi beyond the fact that it doesn't scale AT ALL. (performance on two nodes is the same as on a single node..., whatever parallelisation I choose...)
I will make some tests in the next days, and try to see if the scaling works or not. Apparently some change in the configuration has been done in the last few days.
It seems that your QPhiX branch is lacking commit https://github.com/JeffersonLab/qphix/commit/37d75749af35d058576f69966d9c8e51fccfd858. Therefore you lack one additional geometry check. If that condition is violated, QPhiX still runs but returns garbage. I had this when trying to run the 24³×96 lattice with 128 MPI ranks and a block size of 8. The local geometry was not properly divisible, but it still ran through. Chroma aborted because the solver did not converge.
So perhaps the choice of MPI topology, lattice size, and blocking does not work out properly?
@martin-ueding no, this was on a single node, but I'll cherry-pick the commit
@sunpho84
I will make some tests in the next days, and try to see if the scaling works or not. Apparently some change in the configuration has been done in the last few days.
Any progress on this?
I think I followed all the instructions, but I get this funny error when executing:
Attempting to use an MPI routine before initializing MPI
QMP IS INITIALIZED
this is my test path
/marconi/home/userexternal/fsanfili/programs/tmLQCD_test_QPHIX/tests/A30.32
@sunpho84 you built QMP in serial mode
/homec/hbn28/hbn288/code/qmp/configure --prefix=/homec/hbn28/hbn288/local/jureca/libs/qmp CC=mpicc CFLAGS=-std=c99 --with-qmp-comms-type=MPI
default comms type is SINGLE
I've updated the instructions accordingly to make this explicitly clear.
The resulting mismatch that I obtain is identical to that of your test, see
/marconi_work/INF17_lqcd123_0/sanfo/test_QPHIX/A30.32/test_Bartek.e151004
in all of this, I guess the most important results is that we fixed the building instruction :smile: But I am happy to do more tests of course...
okay, thanks. So, @martin-ueding, does Chroma+qphix work on A1/A2?
I did a HMC on A2 with Chroma and QPhiX, reproducing a result from JURECA. You can use my compilation script for Marconi A2. In line 321 you should only list a single SoA length though. Otherwise it will install all of them and overwrite, that will build Chroma with SoA length of 16, which is probably not desired. However, in the build
directory, there will be compiled QPhiX variants for each SoA length, which is great for testing QPhiX.
I have not tried on the Broadwell partition, but since it works on JURECA with Haswell, I guess it should be just fine on Marconi A1, perhaps use the JURECA script or adapt the Marconi A2 script such that the architecture is AVX2 instead of AVX512.
Thanks Martin. @sunpho84 would you have time to help figure out what's going on? This means trying to understand the interface and qphix, I guess.
The point is, the residue computed with tmLQCD is 4.584444e-04, so the error is rather subtle, otherwise the difference should be much larger. Or do you think it should not be the case?
Ah, I know what the problem is... we don't have support for twisted boundary conditions in qphix yet...
my fault, I set up the input file on Marconi incorrectly...
ecco, I had in mind precisely something like that as a "subtle" effect
Yes, it obviously gives a cmopatible result now.
@martin-ueding In your bootstrap for Chroma on Marconi A2, it seems that Chroma instantiates only AVX2 QphiX kernels.
if ! [[ -f Makefile ]]; then
$sourcedir/$repo/configure $base_configure \
--enable-openmp \
--enable-parallel-arch=parscalar \
--enable-parallel-io \
--enable-precision=double \
--enable-qdp-alignment=128 \
--enable-sse2 \
--enable-qphix-solver-arch=avx2 \
--with-gmp="$prefix" \
--with-libxml2="$prefix/bin/xml2-config" \
--with-qdp="$prefix" \
--with-qphix-solver="$prefix" \
CFLAGS="$cflags" CXXFLAGS="$cxxflags"
fi
correct?
I have made changes to this script on Marconi A2 and have not uploaded them. The tests were run with avx512
in the options, though. Once git is working on Marconi again, I will merge those and update the code.
I just saw that a function in the interface is defined like this:
template <typename FT, int VECLEN, int SOALEN, bool compress12>
void reorder_clover_to_QPhiX(Geometry<FT, VECLEN, SOALEN, compress12> &geom, FT *qphix_clover)
Why not using typename Geometry<FT, VECLEN, SOALEN, compress12>::CloverBlock
or typename Geometry<FT, VECLEN, SOALEN, compress12>::FullCloverBlock
as the type for qphix_clover
? That would make it more type-safe and also would allow overloads for the two distinct cases. Actually I do not even know whether that FT *qphix_clover
is to contain twisted mass or not on the QPhiX side.
I think the reason for the signature is that it was a first attempt, the function clearly needs to be rewritten. In particular, I think that one would probably loop the other way around (over the qphix indices)
From Peter's notes it seems that this function is doing just Wilson clover (\mu = 0
). I can rewrite that function to iterate the QPhiX indices. After that I can probably also do a \mu \neq 0
variant. Or should I rather focus on something else?
Does the current implementation work? Do we have a test for it? I cannot find the function name anywhere else in the interface CPP file, so I guess it has not been written. What kind of test do we want?
Pack the clover term with the packing function and then run …
yes, it's doing only Wilson. There are no test, and I think it won't work, b/c I didn't transpose the colour matrices w.r.t. QPhiX (viewed as memory-contiguous arrays QPhiX and tmlQCD colour matrices are mutually transposed). Also keep in mind, that turning on \mu will change the layout on the QPhiX side, and additionally you will have to add the twisted mass contribution as it is not stored in the tmlQCD clover data structure (although it is in the inverse clover data struct).
@plabus @martin-ueding @urbach This issue is to serve as a forum-like place, in addition to the Kanban, to track progress on the qphix interface. Mainly, this is to have e-mail notifications for important stuff.