etmc / tmLQCD

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
http://www.itkp.uni-bonn.de/~urbach/software.html
GNU General Public License v3.0
32 stars 47 forks source link

qphix interface progress #369

Closed kostrzewa closed 6 years ago

kostrzewa commented 7 years ago

@plabus @martin-ueding @urbach This issue is to serve as a forum-like place, in addition to the Kanban, to track progress on the qphix interface. Mainly, this is to have e-mail notifications for important stuff.

kostrzewa commented 7 years ago

I would like to propose that we move the main devel branch to https://github.com/kostrzewa/tmLQCD/tree/qphix_devel such that all pull-requests going forward should go there. I would not want to keep annoying @plabus with this stuff, now that he has other things to worry about ;)

kostrzewa commented 7 years ago

@sunpho84 Would you be willing to help test and develop at this stage? We really need some more hands on this.

sunpho84 commented 7 years ago

Hi! yeah! what should I do?

kostrzewa commented 7 years ago

So, there's actually something that I find rather perplexing, if not surprising. I've set up a test version of tmLQCD+qphix for the JSC people. In the QPHIX section of this document:https://github.com/etmc/tmLQCD/wiki/JSC-test-setup compilation is described in some detail. When I try to compile this test version using AVX2 kernels - which works perfectly on Jureca - on Marconi (A1 or A2), the final residual is wrong for the test inversion which works fine on Jureca. Exact same code, almost exactly the same compiler and a similar job script.

Some output files, a job script and an input file can be found here together with a test configuration: /marconi_work/INF17_lqcd123_0/bartek/tests/invert_test_cA211a.30.32

It would be nice if someone could confirm that there's an issue on Marconi beyond the fact that it doesn't scale AT ALL. (performance on two nodes is the same as on a single node..., whatever parallelisation I choose...)

sunpho84 commented 7 years ago

I will make some tests in the next days, and try to see if the scaling works or not. Apparently some change in the configuration has been done in the last few days.

martin-ueding commented 7 years ago

It seems that your QPhiX branch is lacking commit https://github.com/JeffersonLab/qphix/commit/37d75749af35d058576f69966d9c8e51fccfd858. Therefore you lack one additional geometry check. If that condition is violated, QPhiX still runs but returns garbage. I had this when trying to run the 24³×96 lattice with 128 MPI ranks and a block size of 8. The local geometry was not properly divisible, but it still ran through. Chroma aborted because the solver did not converge.

So perhaps the choice of MPI topology, lattice size, and blocking does not work out properly?

kostrzewa commented 7 years ago

@martin-ueding no, this was on a single node, but I'll cherry-pick the commit

kostrzewa commented 7 years ago

@sunpho84

I will make some tests in the next days, and try to see if the scaling works or not. Apparently some change in the configuration has been done in the last few days.

Any progress on this?

sunpho84 commented 7 years ago

I think I followed all the instructions, but I get this funny error when executing:

Attempting to use an MPI routine before initializing MPI
QMP IS INITIALIZED

this is my test path

/marconi/home/userexternal/fsanfili/programs/tmLQCD_test_QPHIX/tests/A30.32

kostrzewa commented 7 years ago

@sunpho84 you built QMP in serial mode

kostrzewa commented 7 years ago
/homec/hbn28/hbn288/code/qmp/configure --prefix=/homec/hbn28/hbn288/local/jureca/libs/qmp CC=mpicc CFLAGS=-std=c99 --with-qmp-comms-type=MPI

default comms type is SINGLE

kostrzewa commented 7 years ago

I've updated the instructions accordingly to make this explicitly clear.

sunpho84 commented 7 years ago

The resulting mismatch that I obtain is identical to that of your test, see

/marconi_work/INF17_lqcd123_0/sanfo/test_QPHIX/A30.32/test_Bartek.e151004

in all of this, I guess the most important results is that we fixed the building instruction :smile: But I am happy to do more tests of course...

kostrzewa commented 7 years ago

okay, thanks. So, @martin-ueding, does Chroma+qphix work on A1/A2?

martin-ueding commented 7 years ago

I did a HMC on A2 with Chroma and QPhiX, reproducing a result from JURECA. You can use my compilation script for Marconi A2. In line 321 you should only list a single SoA length though. Otherwise it will install all of them and overwrite, that will build Chroma with SoA length of 16, which is probably not desired. However, in the build directory, there will be compiled QPhiX variants for each SoA length, which is great for testing QPhiX.

I have not tried on the Broadwell partition, but since it works on JURECA with Haswell, I guess it should be just fine on Marconi A1, perhaps use the JURECA script or adapt the Marconi A2 script such that the architecture is AVX2 instead of AVX512.

kostrzewa commented 7 years ago

Thanks Martin. @sunpho84 would you have time to help figure out what's going on? This means trying to understand the interface and qphix, I guess.

sunpho84 commented 7 years ago

The point is, the residue computed with tmLQCD is 4.584444e-04, so the error is rather subtle, otherwise the difference should be much larger. Or do you think it should not be the case?

kostrzewa commented 7 years ago

Ah, I know what the problem is... we don't have support for twisted boundary conditions in qphix yet...

kostrzewa commented 7 years ago

my fault, I set up the input file on Marconi incorrectly...

sunpho84 commented 7 years ago

ecco, I had in mind precisely something like that as a "subtle" effect

kostrzewa commented 7 years ago

Yes, it obviously gives a cmopatible result now.

kostrzewa commented 7 years ago

@martin-ueding In your bootstrap for Chroma on Marconi A2, it seems that Chroma instantiates only AVX2 QphiX kernels.

if ! [[ -f Makefile ]]; then
    $sourcedir/$repo/configure $base_configure \
        --enable-openmp \
        --enable-parallel-arch=parscalar \
        --enable-parallel-io \
        --enable-precision=double \
        --enable-qdp-alignment=128 \
        --enable-sse2 \
        --enable-qphix-solver-arch=avx2 \
        --with-gmp="$prefix" \
        --with-libxml2="$prefix/bin/xml2-config" \
        --with-qdp="$prefix" \
        --with-qphix-solver="$prefix" \
        CFLAGS="$cflags" CXXFLAGS="$cxxflags"
fi

correct?

martin-ueding commented 7 years ago

I have made changes to this script on Marconi A2 and have not uploaded them. The tests were run with avx512 in the options, though. Once git is working on Marconi again, I will merge those and update the code.

martin-ueding commented 7 years ago

I just saw that a function in the interface is defined like this:

template <typename FT, int VECLEN, int SOALEN, bool compress12>
void reorder_clover_to_QPhiX(Geometry<FT, VECLEN, SOALEN, compress12> &geom, FT *qphix_clover)

Why not using typename Geometry<FT, VECLEN, SOALEN, compress12>::CloverBlock or typename Geometry<FT, VECLEN, SOALEN, compress12>::FullCloverBlock as the type for qphix_clover? That would make it more type-safe and also would allow overloads for the two distinct cases. Actually I do not even know whether that FT *qphix_clover is to contain twisted mass or not on the QPhiX side.

kostrzewa commented 7 years ago

I think the reason for the signature is that it was a first attempt, the function clearly needs to be rewritten. In particular, I think that one would probably loop the other way around (over the qphix indices)

martin-ueding commented 7 years ago

From Peter's notes it seems that this function is doing just Wilson clover (\mu = 0). I can rewrite that function to iterate the QPhiX indices. After that I can probably also do a \mu \neq 0 variant. Or should I rather focus on something else?

Does the current implementation work? Do we have a test for it? I cannot find the function name anywhere else in the interface CPP file, so I guess it has not been written. What kind of test do we want?

Pack the clover term with the packing function and then run …

plabus commented 7 years ago

yes, it's doing only Wilson. There are no test, and I think it won't work, b/c I didn't transpose the colour matrices w.r.t. QPhiX (viewed as memory-contiguous arrays QPhiX and tmlQCD colour matrices are mutually transposed). Also keep in mind, that turning on \mu will change the layout on the QPhiX side, and additionally you will have to add the twisted mass contribution as it is not stored in the tmlQCD clover data structure (although it is in the inverse clover data struct).