etmc / tmLQCD

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.
http://www.itkp.uni-bonn.de/~urbach/software.html
GNU General Public License v3.0
32 stars 47 forks source link

testing NDPoly on BGQ ... help?! #156

Closed kostrzewa closed 11 years ago

kostrzewa commented 11 years ago

Today I've begun looking at the HMC using the NDPOLY monomial. In particular, I've attempted to convert an input file for A40.24 to the new format.

The old input file: https://gist.github.com/3776270 The new input file: https://gist.github.com/3776251

During the computation of P and Ptilde the "relative squared accuracy in components" blows up both on Intel (pure MPI) and BGQ (hybrid run). As a result the heatbath for the polynomial generates NaNs.

Now, NDPOLY does seem to work fine because the hmc2 sample file runs well as long as the seed is picked "correctly", but maybe there's something subtle that broke in the last few months?

Could someone please:

  1. check whether I did the conversion correctly (and whether there are some subtle changes from the old to the new format which I didn't take into account)
  2. attempt to run this input file and confirm (or not) that the relative accuracy blows up

Here's the output that I see from the initialisation:

PHMC: chebyshev_polynomial
PHMC: n= 151 inv_n=6.622517e-03 
PHMC: allocation !!!
NDPOLY MD Polynomial: EVmin = 3.000000e-05  EVmax = 1.000000e+00  
NDPOLY MD Polynomial: the degree was set to: 151
NDPOLY MD Polynomial: relative squared accuracy in components:
 UP=1.321599e+200  DN=1.329142e+200 
PHMC: Delta_IR at s=0.000030:    | P s_low P - 1 |/2 = 1.653693e-01 
PHMC: interval of approximation [stilde_min, stilde_max] = [7.410000e-05, 2.470000e+00]
PHMC: degree for P = 150, epsilont = 3.000000e-05, normalisation = 6.362848e-01PHMC: PTILDE-chebyshev_polynomial
PHMC: n= 2000 inv_n=5.000000e-04 
PHMC: allocation !!!
# NDPOLY Acceptance Polynomial: EVmin = 0.000030  EVmax = 1.000000
# NDPOLY ACceptance Polynomial: desired accuracy is 1.000000e-09 
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 1.512285e-02 for degree=302
# NDPOLY Acceptance Polynomial: coef[degree] = 2.138193e-04
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 6.848115e-03 for degree=362
# NDPOLY Acceptance Polynomial: coef[degree] = 8.311085e-05
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 2.977518e-03 for degree=434
# NDPOLY Acceptance Polynomial: coef[degree] = 3.397398e-05
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 1.070818e-03 for degree=520
# NDPOLY Acceptance Polynomial: coef[degree] = 1.256557e-05
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 3.194748e-04 for degree=624
# NDPOLY Acceptance Polynomial: coef[degree] = 3.697736e-06
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 7.600830e-05 for degree=748
# NDPOLY Acceptance Polynomial: coef[degree] = 8.699138e-07
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 1.373834e-05 for degree=897
# NDPOLY Acceptance Polynomial: coef[degree] = -1.562465e-07
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 1.784376e-06 for degree=1076
# NDPOLY Acceptance Polynomial: coef[degree] = 2.017121e-08
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 1.558796e-07 for degree=1291
# NDPOLY Acceptance Polynomial: coef[degree] = -1.753604e-09
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 8.394276e-09 for degree=1549
# NDPOLY Acceptance Polynomial: coef[degree] = -9.511640e-11
# NDPOLY Acceptance Polynomial: Sum remaining | d_n | = 1.683929e-10 for degree=1858
# NDPOLY Acceptance Polynomial: coef[degree] = 2.820614e-12
 sum 1.683929e-10, coef 2.820614e-12
# NDPOLY Acceptance Polynomial: relative squared accuracy in components:
 UP=nan  DN=nan 
# NDPOLY Acceptance Polynomial: Delta_IR at s=0.000030: | Ptilde P s_low P Ptilde - 1 |/2 = 2.255685e-10 
# NDPOLY Acceptance Polynomial degree set to 1858
kostrzewa commented 11 years ago

By the way: why are the results for this "relative squared accuracy" so dependent on the seed? By changing the seed (for the hmc2 sample input file) I can make the accuracies vary from 10e5 to 10e-12 ... it seems a bit random and completely unexpected... I would think that the eigenvalue spectrum would be similar for two different "hot" configurations

urbach commented 11 years ago

can you start from a configuration from that ensemble? A Hot start is really the worst starting point for the polynomial: you'll likely have very small eigenvalues and they'll fluctuate a lot.

the input file looks fine, as far as I can see.

kostrzewa commented 11 years ago

That's what I figured to be the problem, I just wasn't sure I had translated the timescales and integrator parameters correctly. I don't have grid access yet but I will ask someone who does for a configuration. I do have access to a few confs from the ensemble but they have the wrong hashes...

urbach commented 11 years ago

Might be one of the ensembles where all hashes are wrong due to a bug in the hmc version used for that run. You can still use it, if you check the plaquette value to be correct. And you need to se

DisableIOChecks = yes

in your input file...

urbach commented 11 years ago

that's solved by now, isn't it?

kostrzewa commented 11 years ago

Yes, absolutely, let's close this.