JeffersonLab / qphix

QCD for Intel Xeon Phi and Xeon processors
http://jeffersonlab.github.io/qphix/
Other
13 stars 11 forks source link

Multi-CG fails with antiperiodic temporal boundary conditions #111

Open martin-ueding opened 6 years ago

martin-ueding commented 6 years ago

I currently make Bartek's work in the tm_functor_merge_two-flav-mshift ready for merge into devel. The curious errors with undefined types __m128d have disappeared. Now the issue is that the Multi-CG fails, but only with antiperiodic boundary conditions.

In this issue I want to describe my progress, perhaps one of you has some hunch what this actually is.

The Multi-CG test in tests/testMInvCG.cc uses my RandomGauge class and the HybridSpinor:

  RandomGauge<T, V, S, compress, QdpGauge, QdpSpinor> gauge(geom, t_bc);

  HybridSpinor<T, V, S, compress, QdpSpinor> hs_source(geom);
  gaussian(hs_source.qdp());
  hs_source.pack();

For some reason, this works just fine for t_bc = 1. When there is t_bc = -1, the hs_source spinor only contains inf in both the QDP++ data structure and the QPhiX data structure. The gaussian function acts on the QDP++ spinor, so I should not have broken anything there.

I put in some output and assertions that just check this:

  masterPrintf("hs_source.qdp()[site = ...][color = 0][spin = 0][reim = 0][soa = 0] = %g "
               "%g %g %g\n",
               hs_source.qdp().elem(QDP::rb[0].start() + 0).elem(0).elem(0).real(),
               hs_source.qdp().elem(QDP::rb[0].start() + 1).elem(0).elem(0).real(),
               hs_source.qdp().elem(QDP::rb[0].start() + 2).elem(0).elem(0).real(),
               hs_source.qdp().elem(QDP::rb[0].start() + 3).elem(0).elem(0).real());

  masterPrintf("hs_source[0][site = ...][color = 0][spin = 0][reim = 0][soa = 0] = "
               "%g %g %g %g\n",
               hs_source[0][0][0][0][0][0],
               hs_source[0][1][0][0][0][0],
               hs_source[0][2][0][0][0][0],
               hs_source[0][3][0][0][0][0]);

  assert(std::isfinite(
             hs_source.qdp().elem(QDP::rb[0].start() + 0).elem(0).elem(0).real()) &&
         "Random source (QDP structure) is not finite.");
  assert(std::isfinite(hs_source[0][0][0][0][0][0]) &&
         "Random source (QPhiX structure) is not finite.");

For the periodic boundary conditions I get sensible values:

hs_source.qdp()[site = ...][color = 0][spin = 0][reim = 0][soa = 0] = 0.375244 -0.573609 -1.44159 0.799821
hs_source[0][site = ...][color = 0][spin = 0][reim = 0][soa = 0] = 0.375244 -0.179034 1.50215 1.57254

For the antiperiodic boundary conditions I just get junk:

hs_source.qdp()[site = ...][color = 0][spin = 0][reim = 0][soa = 0] = inf inf inf inf
hs_source[0][site = ...][color = 0][spin = 0][reim = 0][soa = 0] = inf inf inf inf

To me it looks like the random number generator would only produce junk. However, since it just works fine in the periodic case, I am rather confused by this. I will have to further investigate.

martin-ueding commented 6 years ago

Also all four of the Dslash tests fail, it seems to be that the actual Dslash operation returns zero, or that the QDP version always returns zero.

martin-ueding commented 6 years ago

I have now made a change to expect_near such that the first element is always the control. Therefore it is now easy to see that all the QDP++ versions return zero vectors. How did that happen? The QDP++ installation on my laptop did not change and also the build that I created for Travis CI should not have changed, it is always downloaded from my website.

Perhaps it is just some weird effect from ccache, though I would hope that this would not happen. I am rebuilding on Travis CI to check whether that is the issue.

So perhaps it is some build problem regarding linkage?

martin-ueding commented 6 years ago

Most of the tests are working again. The issue was simply that my expect_near function, when called with HybridSpinor instances, would always call HybridSpinor::unpack(), which overwrote the QDP++ part. This is now fixed.

Now there are issues with the TM clover linear operator, though Dslash and Achimbdpsi work.

martin-ueding commented 6 years ago

I have just switched the order of the two tests. It is still the second one which fails. So somehow the test harness does something which then makes gaussian fill the spinors with garbage.

Now there is an if-else such that only one of them gets run, preferably the one with antiperiodic boundary conditions in time. But we will have to look into this.