another variance problem due to parallelization!

kostrzewa commented 11 years ago

I think we have another problem of increased variance due to parallelization, a possible culprit could be OpenMP but I'm not 100% sure yet. It is not visible in the plaquette expectation value but in the measurement of m_{PCAC} it is very evident that something fishy is going on...

I'm currently doing a Nf=2 run at 12^3x20 to learn about basic measurements from beginning to end and noticed that the online measurement of m{PCAC} became much more stable (with respect to variance) when I started running with pure MPI on many nodes rather than doing pure OpenMP on one machine.

I had reproducerandomnumbers=yes during the MPI run which I'm going to change later this afternoon and add some trajectories. For now, see the plot below where the first 31 trajectories are done using OpenMP only on one node. The calm section after that was run with pure MPI while the end was run with pure OpenMP again. I haven't tested the hybrid code yet. Note that this is a tuning run and has been restarted from a run with a different kappa so the first few trajectories don't necessarily say much

As you can see, nothing of the sort shows up in the plaquette, although perhaps the OpenMP tail is a bit too smooth with some long-range (10 trajectories) oscillation?:

online

This is very worrying. Could it be due to the way the RNG is initialized during source generation?

kostrzewa commented 11 years ago

I'm doing another 50 trajectories with pure OpenMP to confirm the behaviour of the m_PCAC measurement and then I will switch back to pure MPI with reproducerandomnumbers=no to see whether that's making the pure MPI part so "good" in terms of low variance.

Finally, I will add another 100 or so trajectories using a hybrid version of the code.

kostrzewa commented 11 years ago

I DON'T think this is due to the way that the RNG is reinitialized in source_generation_pion_only. The only way to confirm would be to run serially... that's very impractical for this lattice size though. (although I could let it run for a day or so and get around 30 trajectories...)

I don't really know where to look at this point although obvious culprits are all the functions using omp atomic, although I would expect that to show in the plaquette too (I guess it does, in a way).

kostrzewa commented 11 years ago

OK, so it's clearly not due to reproducerandomnumbers. From trajectory 494 onwards we're back to pure MPI (with reproducerandomnumbers=no) and this is what it looks like. I will try hybrid next.

online

urbach commented 11 years ago

Hi Bartek!

The first thing we should try to exclude is that something in this plot of m_pcac versus t_hmc goes wrong. Thats unlikely, I agree.

Second is to check the part doing the contraction for the online measurements. Anything that could go wrong there in the openMP and/or MPI case?

Is there maybe some precision issue in the pure open MP inversions for the measurements? Does this fluctuation show up in the PP and the PA, or in only one of them? How can I reproduce this? Does it also show up on 4^4 lattices?

Using the reproduce rng feature, do you still get identical plaquette values after one trajectory?

I would find it highly surprising if there was a deeper problem that escaped all your hight statistics tests. Moreover, the online measurements have been crosschecked with offline measurements, confirming the online measurements. But Petros/Roberto observed somewhat too large fluctuations, as they wrote in their emails, which they attributed to too loose precision in the inversions.

urbach commented 11 years ago

I've tried 4^3x8 serial versus openMP, just 10 trajectories, with online measurements after each trajectory. Here is the analysis

> res$dpaopp
  t        mass     dmass     ddmass    tauint   dtauint
1 1 -0.18901415 0.1410771 0.05463891 0.6676308 0.3852338
2 2  0.11720897 0.0578029 0.02890145 0.4258077 0.3878537
3 3  0.05189274 0.1036935 0.05184675 1.1022011 0.8241631
> res1$dpaopp
  t        mass     dmass     ddmass    tauint   dtauint 
1 1 -0.18901415 0.1410771 0.05463891 0.6676308 0.3852338
2 2  0.11720897 0.0578029 0.02890145 0.4258077 0.3878537
3 3  0.05189274 0.1036935 0.05184675 1.1022011 0.8241631
> res$MChist.dpaopp
 [1]  0.288916018  0.190761577 -0.108701890  0.052108648 -0.057439508
 [6] -0.248466599 -0.001712946  0.381267512 -0.185376527 -0.719809641
> res1$MChist.dpaopp
[1]  0.288916018  0.190761577 -0.108701890  0.052108648 -0.057439508
[6] -0.248466599 -0.001712946  0.381267512 -0.185376527 -0.719809641

res is the version with OMP_NUM_THREADS=1 and res1 the one with OMP_NUM_THREADS=4. They agree perfectly, I'd say. also the plaquette agrees perfectly. The same if I compile without OMP support. MPI I'll try tomorrow...

urbach commented 11 years ago

with or without SSE and with or without halfspinor also agree. All with gcc, though...

kostrzewa commented 11 years ago

Thanks for the cross-check, that's certainly very encouraging, it might turn out to just be a compiler problem. My hybrid run shows the same issue although it is somewhat milder. All my executables are compiled with icc. I will continue testing today.

kostrzewa commented 11 years ago

Second is to check the part doing the contraction for the online measurements. Anything that could go wrong there in the openMP and/or MPI case?

Could it just be round-off? There is a very large accumulation done in online_measurement.c

When MPI is used, the local volume is much smaller and hence these sums are relatively short. As a consequence when the MPI sum is done, numbers of similar size are added together and the result will be more stable with respect to round-off.

This would also adequately explain why there is no difference between serial/openmp (which I just checked and can confirm, albeit for just 4^4 and a mass that is 10 times heavier than in my run) It would also help in explaining why the hybrid run is less affected.

When my current openmp run is done I will add a Kahan summation into the online measurement and we'll try again. I even have a model for this in my PP correlator test code (that I told you about) to add some OpenMP parallelization to the computation of Cxx[t] with OpenMP parallelization in t, including a Kahan summation. But one step at a time.

kostrzewa commented 11 years ago

Does this fluctuation show up in the PP and the PA, or in only one of them?

I have to admit I haven't checked explicitly on a per-correlator basis. I was only looking at the pcac mass history.

How can I reproduce this?

run the same thing with pure mpi and pure openmp and compare m_pcac histories, alternatively do half of the run with one parallelization, half with the other

Does it also show up on 4^4 lattices?

not sure, haven't tried yet, given my argument above it's unlikely but possible

urbach commented 11 years ago

If there is a problem you should see it by comparing after only one trajectory, shouldn't you?

kostrzewa commented 11 years ago

If there is a problem you should see it by comparing after only one trajectory, shouldn't you?

Yes, absolutely, see below.

When my current openmp run is done I will add a Kahan summation into the online measurement and we'll try again.

I added a Kahan summation here but I see NO difference at all in onlinemeas.* for a test-run at 8^3x16 between Kahan and no Kahan:

11:54 kostrzew@blade8b ~/code/tmLQCD.kost/build_openmp_wgs (etmcmaster|✚3…) $ diff -y -W 100 onlinemeas.000000 onlinemeas.000000.omp16_kahan
1  1  0  2.493487e+01  0.000000e+00     1  1  0  2.493487e+01  0.000000e+00
1  1  1  3.632671e+00  2.436011e+00     1  1  1  3.632671e+00  2.436011e+00
1  1  2  5.727120e-01  2.451563e-01     1  1  2  5.727120e-01  2.451563e-01
1  1  3  7.388951e-02  4.784140e-02     1  1  3  7.388951e-02  4.784140e-02
1  1  4  9.862494e-03  5.357782e-03     1  1  4  9.862494e-03  5.357782e-03
1  1  5  1.953851e-03  5.278829e-04     1  1  5  1.953851e-03  5.278829e-04
1  1  6  1.696005e-04  9.031012e-05     1  1  6  1.696005e-04  9.031012e-05
1  1  7  3.336515e-05  1.689661e-05     1  1  7  3.336515e-05  1.689661e-05
1  1  8  6.778363e-06  0.000000e+00     1  1  8  6.778363e-06  0.000000e+00
2  1  0  7.781387e+00  0.000000e+00     2  1  0  7.781387e+00  0.000000e+00
2  1  1  1.845803e-01  7.916336e-01     2  1  1  1.845803e-01  7.916336e-01
2  1  2  -1.197738e-01  -8.355770e-03       2  1  2  -1.197738e-01  -8.355770e-03
2  1  3  1.815930e-02  4.052258e-03     2  1  3  1.815930e-02  4.052258e-03
2  1  4  3.664388e-03  -1.248370e-03        2  1  4  3.664388e-03  -1.248370e-03
2  1  5  7.197363e-04  -1.158727e-04        2  1  5  7.197363e-04  -1.158727e-04
2  1  6  6.479225e-05  -2.354105e-05        2  1  6  6.479225e-05  -2.354105e-05
2  1  7  1.337388e-06  -9.214752e-06        2  1  7  1.337388e-06  -9.214752e-06
2  1  8  -1.081962e-06  0.000000e+00        2  1  8  -1.081962e-06  0.000000e+00
6  1  0  6.358501e+00  0.000000e+00     6  1  0  6.358501e+00  0.000000e+00
6  1  1  2.536977e+00  -5.387693e-01        6  1  1  2.536977e+00  -5.387693e-01
6  1  2  4.336681e-01  -4.918158e-02        6  1  2  4.336681e-01  -4.918158e-02
6  1  3  5.356017e-02  -2.574008e-02        6  1  3  5.356017e-02  -2.574008e-02
6  1  4  6.028301e-03  -1.937373e-03        6  1  4  6.028301e-03  -1.937373e-03
6  1  5  8.539012e-04  -1.310331e-04        6  1  5  8.539012e-04  -1.310331e-04
6  1  6  8.683758e-05  -1.604958e-05        6  1  6  8.683758e-05  -1.604958e-05
6  1  7  2.742399e-05  -9.454013e-06        6  1  7  2.742399e-05  -9.454013e-06
6  1  8  3.934656e-07  0.000000e+00     6  1  8  3.934656e-07  0.000000e+00

Even with reproducerandomnumbers=yes, I do see a very significant difference between MPI / no MPI on the level of the correlators though:

11:51 kostrzew@blade8b ~/code/tmLQCD.kost (etmcmaster|✚3…) $ diff -y -W 100 build_3D_MPI_hs_pax/onlinemeas.000000 build_openmp_wgs/onlinemeas.000000.omp16_kahan
1  1  0  3.583142e+01  0.000000e+00       | 1  1  0  2.493487e+01  0.000000e+00
1  1  1  3.296734e+00  2.969451e+00       | 1  1  1  3.632671e+00  2.436011e+00
1  1  2  4.057276e-01  3.517819e-01       | 1  1  2  5.727120e-01  2.451563e-01
1  1  3  4.517120e-02  5.054174e-02       | 1  1  3  7.388951e-02  4.784140e-02
1  1  4  5.965195e-03  6.431571e-03       | 1  1  4  9.862494e-03  5.357782e-03
1  1  5  8.324182e-04  9.961091e-04       | 1  1  5  1.953851e-03  5.278829e-04
1  1  6  1.292595e-04  1.470315e-04       | 1  1  6  1.696005e-04  9.031012e-05
1  1  7  1.502888e-05  2.649542e-05       | 1  1  7  3.336515e-05  1.689661e-05
1  1  8  5.902200e-06  0.000000e+00       | 1  1  8  6.778363e-06  0.000000e+00
2  1  0  1.855315e+00  0.000000e+00       | 2  1  0  7.781387e+00  0.000000e+00
2  1  1  1.041653e+00  -1.095293e+00          | 2  1  1  1.845803e-01  7.916336e-01
2  1  2  1.065586e-01  -1.244458e-01          | 2  1  2  -1.197738e-01  -8.355770e-03
2  1  3  1.143322e-02  -1.645176e-02          | 2  1  3  1.815930e-02  4.052258e-03
2  1  4  3.802521e-04  -2.712032e-03          | 2  1  4  3.664388e-03  -1.248370e-03
2  1  5  1.804312e-04  -2.805560e-04          | 2  1  5  7.197363e-04  -1.158727e-04
2  1  6  3.510223e-05  -6.367013e-05          | 2  1  6  6.479225e-05  -2.354105e-05
2  1  7  5.462901e-06  -3.206055e-06          | 2  1  7  1.337388e-06  -9.214752e-06
2  1  8  1.426100e-06  0.000000e+00       | 2  1  8  -1.081962e-06  0.000000e+00
6  1  0  -2.878380e+00  0.000000e+00          | 6  1  0  6.358501e+00  0.000000e+00
6  1  1  2.249188e+00  -1.297361e+00          | 6  1  1  2.536977e+00  -5.387693e-01
6  1  2  2.453926e-01  -1.254185e-01          | 6  1  2  4.336681e-01  -4.918158e-02
6  1  3  1.056075e-02  -1.683040e-02          | 6  1  3  5.356017e-02  -2.574008e-02
6  1  4  3.291887e-03  -2.817410e-03          | 6  1  4  6.028301e-03  -1.937373e-03
6  1  5  4.616233e-04  -5.540282e-04          | 6  1  5  8.539012e-04  -1.310331e-04
6  1  6  7.452629e-05  -8.447372e-05          | 6  1  6  8.683758e-05  -1.604958e-05
6  1  7  8.840458e-06  -1.403312e-05          | 6  1  7  2.742399e-05  -9.454013e-06
6  1  8  -2.264645e-07  0.000000e+00          | 6  1  8  3.934656e-07  0.000000e+00

On the level of the plaquette these runs seems to be completely compatible:

12:51 kostrzew@blade8f ~/code/tmLQCD.kost (etmcmaster|✚3…) $ diff -y -W 118 build_openmp_wgs/omp16_kahan.data build_3D_MPI_hs_pax/mpi_8_3D.data
00000000 0.297755975545 -5.772757032268 3.214227e+02 2    | 00000000 0.297755975545 -5.772757032333 3.214227e+02 2
00000001 0.383681731184 -3.657660451776 3.877053e+01 2    | 00000001 0.383681731184 -3.657660451849 3.877053e+01 2
00000002 0.429726941943 -1.414715887855 4.115317e+00 2    | 00000002 0.429726941943 -1.414715887789 4.115317e+00 2
00000003 0.456816154200 -1.238757090774 3.451321e+00 2    | 00000003 0.456816154200 -1.238757090730 3.451321e+00 2
00000004 0.472229062306 -0.686094532699 1.985944e+00 2    | 00000004 0.472229062306 -0.686094532699 1.985944e+00 2
00000005 0.487670077287 -0.604563556219 1.830453e+00 2    | 00000005 0.487670077287 -0.604563556211 1.830453e+00 2
00000006 0.493795474300 0.128804694687 8.791457e-01 27    | 00000006 0.493795474300 0.128804694752 8.791457e-01 27
00000007 0.493795474300 0.136298800651 8.725819e-01 27    | 00000007 0.493795474300 0.136298800579 8.725819e-01 27
00000008 0.498644966981 -0.231961259276 1.261071e+00 2    | 00000008 0.498644966981 -0.231961259371 1.261071e+00 2
00000009 0.498644966981 0.234397840482 7.910470e-01 27    | 00000009 0.498644966981 0.234397840381 7.910470e-01 27
00000010 0.503439824602 0.233070208727 7.920980e-01 27    | 00000010 0.503439824602 0.233070208807 7.920980e-01 27
00000011 0.505482558884 0.069847415791 9.325361e-01 27    | 00000011 0.505482558884 0.069847415849 9.325361e-01 27
00000012 0.508547235495 0.112196140770 8.938689e-01 27    | 00000012 0.508547235495 0.112196140682 8.938689e-01 27
00000013 0.510466965313 -0.388601017701 1.474916e+00 2    | 00000013 0.510466965313 -0.388601017796 1.474916e+00 2
00000014 0.511048831401 0.168259862352 8.451342e-01 27    | 00000014 0.511048831401 0.168259862337 8.451342e-01 27
00000015 0.511049193730 0.180563802802 8.347994e-01 27    | 00000015 0.511049193730 0.180563802882 8.347994e-01 27
00000016 0.511152568268 0.078752239344 9.242689e-01 27    | 00000016 0.511152568267 0.078752239511 9.242689e-01 27
00000017 0.511152568268 0.323074745895 7.239197e-01 27    | 00000017 0.511152568267 0.323074745771 7.239197e-01 27
00000018 0.513700434563 -0.177689988130 1.194455e+00 2    | 00000018 0.513700434561 -0.177689987970 1.194455e+00 2
00000019 0.513642212879 -0.379668667214 1.461800e+00 2    | 00000019 0.513642212881 -0.379668667163 1.461800e+00 2

And just for good measure, another comparison after one trajectory of the MPI and non-MPI correlator but with 16 MPI processes this time:

12:52 kostrzew@blade8f ~/code/tmLQCD.kost (etmcmaster|✚3…) $ diff -y -W 118 build_openmp_wgs/onlinemeas.000000 build_3D_MPI_hs_wgs/onlinemeas.000000
1  1  0  2.493487e+01  0.000000e+00           | 1  1  0  3.284327e+01  0.000000e+00
1  1  1  3.632671e+00  2.436011e+00           | 1  1  1  4.224729e+00  3.589291e+00
1  1  2  5.727120e-01  2.451563e-01           | 1  1  2  5.302579e-01  5.559800e-01
1  1  3  7.388951e-02  4.784140e-02           | 1  1  3  7.702127e-02  6.436265e-02
1  1  4  9.862494e-03  5.357782e-03           | 1  1  4  1.164042e-02  8.727955e-03
1  1  5  1.953851e-03  5.278829e-04           | 1  1  5  1.282810e-03  1.408820e-03
1  1  6  1.696005e-04  9.031012e-05           | 1  1  6  1.867620e-04  1.945576e-04
1  1  7  3.336515e-05  1.689661e-05           | 1  1  7  2.810947e-05  2.461970e-05
1  1  8  6.778363e-06  0.000000e+00           | 1  1  8  8.242407e-06  0.000000e+00
2  1  0  7.781387e+00  0.000000e+00           | 2  1  0  2.599950e+00  0.000000e+00
2  1  1  1.845803e-01  7.916336e-01           | 2  1  1  6.436937e-01  -9.233037e-01
2  1  2  -1.197738e-01  -8.355770e-03             | 2  1  2  5.892140e-02  -1.203017e-01
2  1  3  1.815930e-02  4.052258e-03           | 2  1  3  1.946809e-02  -1.400210e-02
2  1  4  3.664388e-03  -1.248370e-03              | 2  1  4  1.817279e-03  -3.063235e-03
2  1  5  7.197363e-04  -1.158727e-04              | 2  1  5  4.982813e-04  -3.622151e-04
2  1  6  6.479225e-05  -2.354105e-05              | 2  1  6  3.976636e-05  -5.470423e-05
2  1  7  1.337388e-06  -9.214752e-06              | 2  1  7  3.289593e-06  -5.885289e-06
2  1  8  -1.081962e-06  0.000000e+00              | 2  1  8  5.087794e-08  0.000000e+00
6  1  0  6.358501e+00  0.000000e+00           | 6  1  0  -9.992991e-01  0.000000e+00
6  1  1  2.536977e+00  -5.387693e-01              | 6  1  1  2.641958e+00  -2.032417e+00
6  1  2  4.336681e-01  -4.918158e-02              | 6  1  2  3.272099e-01  -3.789808e-01
6  1  3  5.356017e-02  -2.574008e-02              | 6  1  3  4.743740e-02  -4.163792e-02
6  1  4  6.028301e-03  -1.937373e-03              | 6  1  4  8.175692e-03  -5.811491e-03
6  1  5  8.539012e-04  -1.310331e-04              | 6  1  5  7.177018e-04  -7.013163e-04
6  1  6  8.683758e-05  -1.604958e-05              | 6  1  6  1.079254e-04  -1.139296e-04
6  1  7  2.742399e-05  -9.454013e-06              | 6  1  7  2.037146e-05  -1.326496e-05
6  1  8  3.934656e-07  0.000000e+00           | 6  1  8  -3.225093e-07  0.000000e+00

And a comparison from two different MPI parallelizations which also doesn't match:

12:54 kostrzew@blade8f ~/code/tmLQCD.kost (etmcmaster|✚3…) $ diff -y -W 118 build_3D_MPI_hs_pax/onlinemeas.000000.3D_MPI_hs_pax build_3D_MPI_hs_wgs/onlinemeas.000000
1  1  0  3.583142e+01  0.000000e+00           | 1  1  0  3.284327e+01  0.000000e+00
1  1  1  3.296734e+00  2.969451e+00           | 1  1  1  4.224729e+00  3.589291e+00
1  1  2  4.057276e-01  3.517819e-01           | 1  1  2  5.302579e-01  5.559800e-01
1  1  3  4.517120e-02  5.054174e-02           | 1  1  3  7.702127e-02  6.436265e-02
1  1  4  5.965195e-03  6.431571e-03           | 1  1  4  1.164042e-02  8.727955e-03
1  1  5  8.324182e-04  9.961091e-04           | 1  1  5  1.282810e-03  1.408820e-03
1  1  6  1.292595e-04  1.470315e-04           | 1  1  6  1.867620e-04  1.945576e-04
1  1  7  1.502888e-05  2.649542e-05           | 1  1  7  2.810947e-05  2.461970e-05
1  1  8  5.902200e-06  0.000000e+00           | 1  1  8  8.242407e-06  0.000000e+00
2  1  0  1.855315e+00  0.000000e+00           | 2  1  0  2.599950e+00  0.000000e+00
2  1  1  1.041653e+00  -1.095293e+00              | 2  1  1  6.436937e-01  -9.233037e-01
2  1  2  1.065586e-01  -1.244458e-01              | 2  1  2  5.892140e-02  -1.203017e-01
2  1  3  1.143322e-02  -1.645176e-02              | 2  1  3  1.946809e-02  -1.400210e-02
2  1  4  3.802521e-04  -2.712032e-03              | 2  1  4  1.817279e-03  -3.063235e-03
2  1  5  1.804312e-04  -2.805560e-04              | 2  1  5  4.982813e-04  -3.622151e-04
2  1  6  3.510223e-05  -6.367013e-05              | 2  1  6  3.976636e-05  -5.470423e-05
2  1  7  5.462901e-06  -3.206055e-06              | 2  1  7  3.289593e-06  -5.885289e-06
2  1  8  1.426100e-06  0.000000e+00           | 2  1  8  5.087794e-08  0.000000e+00
6  1  0  -2.878380e+00  0.000000e+00              | 6  1  0  -9.992991e-01  0.000000e+00
6  1  1  2.249188e+00  -1.297361e+00              | 6  1  1  2.641958e+00  -2.032417e+00
6  1  2  2.453926e-01  -1.254185e-01              | 6  1  2  3.272099e-01  -3.789808e-01
6  1  3  1.056075e-02  -1.683040e-02              | 6  1  3  4.743740e-02  -4.163792e-02
6  1  4  3.291887e-03  -2.817410e-03              | 6  1  4  8.175692e-03  -5.811491e-03
6  1  5  4.616233e-04  -5.540282e-04              | 6  1  5  7.177018e-04  -7.013163e-04
6  1  6  7.452629e-05  -8.447372e-05              | 6  1  6  1.079254e-04  -1.139296e-04
6  1  7  8.840458e-06  -1.403312e-05              | 6  1  7  2.037146e-05  -1.326496e-05
6  1  8  -2.264645e-07  0.000000e+00              | 6  1  8  -3.225093e-07  0.000000e+00

kostrzewa commented 11 years ago

I think, therefore - if someone could please cross-check - that we can conclude that the problem originates from the MPI sum and gather operations in the correlator computation.

deuzeman commented 11 years ago

I will attempt to reproduce this with icc and OpenMPI.

The differences seem too big to be a rounding issue to me. Since this summation is done at the very end of the calculation, there's nothing to enhance the differences.

kostrzewa commented 11 years ago

I tried the same just now with gcc on my laptop and I get a clear difference between 4 processes and 2... I even used \tau=0.000001 and reproducerandomnumbers=yes

13:14 bartek@artemis ~/code/tmLQCD.kost/build_mpi (etmcmaster|✚1…) $ diff -y -W 100 onlinemeas.000000.mpi_4_1D onlinemeas.000000.mpi_2_1D
1  1  0  2.395947e+01  0.000000e+00       | 1  1  0  2.525002e+01  0.000000e+00
1  1  1  3.709687e+00  2.424652e+00       | 1  1  1  2.803885e+00  2.309774e+00
1  1  2  4.172649e-01  3.800205e-01       | 1  1  2  2.411146e-01  2.188681e-01
1  1  3  5.927897e-02  3.897985e-02       | 1  1  3  2.163117e-02  2.591199e-02
1  1  4  5.108701e-03  4.516274e-03       | 1  1  4  2.383348e-03  3.268691e-03
1  1  5  1.403971e-03  8.718435e-04       | 1  1  5  4.598607e-04  5.547285e-04
1  1  6  2.138396e-04  8.323216e-05       | 1  1  6  5.640558e-05  5.988951e-05
1  1  7  2.243546e-05  9.381115e-06       | 1  1  7  7.864775e-06  8.542043e-06
1  1  8  3.694871e-06  0.000000e+00       | 1  1  8  2.528695e-06  0.000000e+00
2  1  0  -2.712960e+00  0.000000e+00          | 2  1  0  -6.881594e+00  0.000000e+00
2  1  1  9.025073e-01  -9.609200e-02          | 2  1  1  -9.285053e-01  -5.379556e-01
2  1  2  1.301342e-03  2.410228e-02       | 2  1  2  1.556907e-02  -7.776891e-02
2  1  3  2.316897e-03  -7.541878e-03          | 2  1  3  7.572548e-03  -7.184931e-03
2  1  4  8.655893e-04  -3.117261e-04          | 2  1  4  -5.365922e-04  -2.508335e-05
2  1  5  2.192285e-04  -1.870955e-04          | 2  1  5  -1.207693e-04  -1.630122e-05
2  1  6  5.546197e-05  -1.978795e-05          | 2  1  6  5.634186e-06  -8.677705e-07
2  1  7  3.643263e-06  -2.283807e-06          | 2  1  7  2.856764e-06  -1.348469e-06
2  1  8  1.599540e-07  0.000000e+00       | 2  1  8  -6.402346e-07  0.000000e+00
6  1  0  -5.872514e+00  0.000000e+00          | 6  1  0  1.228379e+00  0.000000e+00
6  1  1  1.127542e+00  -3.607082e-01          | 6  1  1  2.861621e-01  -1.159990e+00
6  1  2  1.450568e-01  -2.212542e-01          | 6  1  2  1.274718e-01  -1.419306e-01
6  1  3  2.650210e-02  -1.763487e-02          | 6  1  3  1.316996e-02  -3.497891e-03
6  1  4  4.032474e-03  -3.384757e-03          | 6  1  4  2.194532e-04  -1.582169e-03
6  1  5  1.016316e-03  -4.139272e-04          | 6  1  5  3.260346e-04  -1.254990e-04
6  1  6  1.000620e-04  -2.953921e-05          | 6  1  6  3.996471e-05  -2.192821e-05
6  1  7  4.957542e-06  -4.314978e-06          | 6  1  7  3.662338e-06  -4.128499e-06
6  1  8  7.337025e-07  0.000000e+00       | 6  1  8  -2.604950e-07  0.000000e+00

13:15 bartek@artemis ~/code/tmLQCD.kost/build_mpi (etmcmaster|✚1…) $ diff -y -W 118 mpi_4_1D.data mpi_2_1D.data                          
00000000 0.121247194262 0.000000000007 1.000000e+00 31    | 00000000 0.121247194262 0.000000000189 1.000000e+00 31

Is it worrying that dH is so vastly different (at this level of precision) even though I'm using reproducerandomnumbers=yes and tau=0.000001 ?

kostrzewa commented 11 years ago

For these two runs the checksums for the gauge configuration at the end of the trajectory are even exactly the same.

kostrzewa commented 11 years ago

@urbach Could this be the culprit? mpi_time_rank = 0 for all processes?

# Process 1 of 4 on artemis: cart_id 1, coordinates (1 0 0 0)
# Process 2 of 4 on artemis: cart_id 2, coordinates (2 0 0 0)
# Process 0 of 4 on artemis: cart_id 0, coordinates (0 0 0 0)
# Process 3 of 4 on artemis: cart_id 3, coordinates (3 0 0 0)
# My mpi_time_rank = 0, g_proc_coords = (2,0,0,0), g_cart_id = 2
# My mpi_time_rank = 0, g_proc_coords = (1,0,0,0), g_cart_id = 1
# My mpi_time_rank = 0, g_proc_coords = (0,0,0,0), g_cart_id = 0
# My mpi_time_rank = 0, g_proc_coords = (3,0,0,0), g_cart_id = 3
# My mpi_z_rank = 0, g_proc_coords = (0,0,0,0), g_cart_id = 0
# My mpi_z_rank = 2, g_proc_coords = (2,0,0,0), g_cart_id = 2
# My mpi_z_rank = 1, g_proc_coords = (1,0,0,0), g_cart_id = 1
# My mpi_z_rank = 3, g_proc_coords = (3,0,0,0), g_cart_id = 3
# My mpi_SV_rank = 1, g_proc_coords = (1,0,0,0), g_cart_id = 1
# My mpi_SV_rank = 0, g_proc_coords = (0,0,0,0), g_cart_id = 0
# My mpi_SV_rank = 2, g_proc_coords = (2,0,0,0), g_cart_id = 2
# My mpi_SV_rank = 3, g_proc_coords = (3,0,0,0), g_cart_id = 3
# My mpi_ST_rank = 0, g_proc_coords = (2,0,0,0), g_cart_id = 2

deuzeman commented 11 years ago

Well, I can reproduce the results with a parallelization just in the X and Y directions. Same plaquette value, completely different correlators for runs with 1, 2 and 4 MPI processes. But in that case, it couldn't be mpi_time_rank, could it?

kostrzewa commented 11 years ago

Well, I can reproduce the results with a parallelization just in the X and Y directions. Same plaquette value, completely different correlators for runs with 1, 2 and 4 MPI processes. But in that case, it couldn't be mpi_time_rank, could it?

Can the computation, as it is currently written, even work with XY parallelization?

deuzeman commented 11 years ago

Can the computation, as it is currently written, even work with XY parallelization?

I may be missing the point, but wouldn't the reduction over g_mpi_time_slices take care of this? In this case, g_mpi_time_slices should just be the whole Cartesian communicator.

deuzeman commented 11 years ago

Hmm. If I just parallelize in the T direction, however, it seems I get consistent results.

kostrzewa commented 11 years ago

Hmm. If I just parallelize in the T direction, however, it seems I get consistent results.

Interesting, I don't... can you run at debuglevel=5 and see the initial messages?

deuzeman commented 11 years ago

Just checking to make sure I didn't confuse myself... :)

deuzeman commented 11 years ago

Alright, this was already checked I guess, but I get agreement for a scalar build and an MPI build run with a single process.

diff -y -W 100 onlinemeas.000006.p1 onlinemeas.000006.s1 1 1 0 2.557430e+01 0.000000e+00 1 1 0 2.557430e+01 0.000000e+00 1 1 1 1.829296e+00 3.758436e+00 1 1 1 1.829296e+00 3.758436e+00 1 1 2 4.322338e-01 0.000000e+00 1 1 2 4.322338e-01 0.000000e+00 2 1 0 -1.204301e+01 0.000000e+00 2 1 0 -1.204301e+01 0.000000e+00 2 1 1 -6.435530e-01 -5.468982e-01 2 1 1 -6.435530e-01 -5.468982e-01 2 1 2 -3.410654e-02 0.000000e+00 2 1 2 -3.410654e-02 0.000000e+00 6 1 0 -1.249877e+01 0.000000e+00 6 1 0 -1.249877e+01 0.000000e+00 6 1 1 6.198682e-01 -1.978413e+00 6 1 1 6.198682e-01 -1.978413e+00 6 1 2 -1.097072e-01 0.000000e+00 6 1 2 -1.097072e-01 0.000000e+00

The same for a parallelization in the T direction.

diff -y -W 100 onlinemeas.000006.p1 onlinemeas.000006.p2t 1 1 0 2.557430e+01 0.000000e+00 1 1 0 2.557430e+01 0.000000e+00 1 1 1 1.829296e+00 3.758436e+00 1 1 1 1.829296e+00 3.758436e+00 1 1 2 4.322338e-01 0.000000e+00 1 1 2 4.322338e-01 0.000000e+00 2 1 0 -1.204301e+01 0.000000e+00 2 1 0 -1.204301e+01 0.000000e+00 2 1 1 -6.435530e-01 -5.468982e-01 2 1 1 -6.435530e-01 -5.468982e-01 2 1 2 -3.410654e-02 0.000000e+00 2 1 2 -3.410654e-02 0.000000e+00 6 1 0 -1.249877e+01 0.000000e+00 6 1 0 -1.249877e+01 0.000000e+00 6 1 1 6.198682e-01 -1.978413e+00 6 1 1 6.198682e-01 -1.978413e+00 6 1 2 -1.097072e-01 0.000000e+00 6 1 2 -1.097072e-01 0.000000e+00

But not for parallelization in the X direction!

diff -y -W 100 onlinemeas.000006.p2t onlinemeas.000006.p2x 1 1 0 2.557430e+01 0.000000e+00 | 1 1 0 2.026491e+01 0.000000e+00 1 1 1 1.829296e+00 3.758436e+00 | 1 1 1 3.489072e+00 3.597169e+00 1 1 2 4.322338e-01 0.000000e+00 | 1 1 2 8.587605e-01 0.000000e+00 2 1 0 -1.204301e+01 0.000000e+00 | 2 1 0 -1.032211e+01 0.000000e+00 2 1 1 -6.435530e-01 -5.468982e-01 | 2 1 1 -2.461695e+00 -1.852010e-01 2 1 2 -3.410654e-02 0.000000e+00 | 2 1 2 -2.805854e-01 0.000000e+00 6 1 0 -1.249877e+01 0.000000e+00 | 6 1 0 -1.925853e-01 0.000000e+00 6 1 1 6.198682e-01 -1.978413e+00 | 6 1 1 8.507125e-01 -1.482761e+00 6 1 2 -1.097072e-01 0.000000e+00 | 6 1 2 9.406330e-02 0.000000e+00

Perhaps the communicators aren't set up properly? At any rate, it's weird that time parallelization seems to cause me no trouble, but it trips you up.. Did you compile your code with 4D parallelization?

kostrzewa commented 11 years ago

Perhaps the communicators aren't set up properly? At any rate, it's weird that time parallelization seems to cause me no trouble, but it trips you up.. Did you compile your code with 4D parallelization?

no, with 1D parallelization in T (the default)

Could you try slightly larger volumes? say 8^3x16, since it's only one trajectory and tau can be very small, it is fast

deuzeman commented 11 years ago

My guess is, that there is an issue in the construction of the timeslice communicators. g_mpi_time_slices and g_mpi_SV_slices are used in two places: online_measurement.c and polyakov_loop.c. So nothing should be affected in the HMC, just these two measurements. And we just got a bug report on the Polyakov loop, too (#251).

deuzeman commented 11 years ago

Could you try slightly larger volumes?

Sure.

deuzeman commented 11 years ago

Interesting -- the larger volume matters. Now I also see issues with parallelization in the T direction.

diff -y -W 100 onlinemeas.000000.p1 onlinemeas.000000.p2 1 1 0 3.198275e+01 0.000000e+00 | 1 1 0 3.115061e+01 0.000000e+00 1 1 1 4.723553e+00 2.504199e+00 | 1 1 1 1.450164e+00 3.815449e+00 1 1 2 5.636580e-01 4.603009e-01 | 1 1 2 3.065834e-01 5.968957e-01 1 1 3 6.607852e-02 6.614074e-02 | 1 1 3 5.894926e-02 6.846152e-02 1 1 4 1.067107e-02 4.709285e-03 | 1 1 4 5.882510e-03 7.031022e-03 1 1 5 1.378402e-03 6.325933e-04 | 1 1 5 1.007201e-03 1.206607e-03 1 1 6 1.849708e-04 1.058444e-04 | 1 1 6 1.473279e-04 1.805708e-04 1 1 7 2.447191e-05 1.709129e-05 | 1 1 7 3.426484e-05 2.774201e-05 1 1 8 4.931717e-06 0.000000e+00 | 1 1 8 1.056044e-05 0.000000e+00 [snip]

Perhaps this is redundant, but I still get identical results for the scalar build and the MPI one run with a single process.

urbach commented 11 years ago

hmm, interesting...

kostrzewa commented 11 years ago

One last test, since you already have those numbers, can you try a 8^4 volume (rather than T=16)?

deuzeman commented 11 years ago

I will do this, but I'm attending a seminar now. Will get back to you in an hour.

kostrzewa commented 11 years ago

Oh okay, don't worry then, I just ran the test and also don't get agreement. (I was thinking that perhaps something was going wrong in the calculation of some sidelength)

kostrzewa commented 11 years ago

Perhaps this is redundant, but I still get identical results for the scalar build and the MPI one run with a single process.

I can confirm having tested this too.

kostrzewa commented 11 years ago

From what I understand I cannot really comprehend why all processes get the same mpi_time_rank. In MPI_comm_split they all have different "colors" (g_proc_coords[0]).

However, as a consequence of being assigned the same mpi_time_rank, the SV_slices will be wrongly attributed. Correct?

kostrzewa commented 11 years ago

Hmm, okay so going a bit further in TX parallelization the logic seems to work correctly but something in the reduction is a bit strange. See below res and mpi_res:

t:0 res: 1.219950 mp_res: 0.000000 coords: 0 1 0 0
t:1 res: 0.228919 mp_res: 0.000000 coords: 0 1 0 0
t:2 res: 0.553401 mp_res: 0.000000 coords: 0 1 0 0
t:3 res: 5.602785 mp_res: 0.000000 coords: 0 1 0 0
t:0 res: 50.466172 mp_res: 0.000000 coords: 1 1 0 0
t:1 res: 531.968764 mp_res: 0.000000 coords: 1 1 0 0
t:2 res: 74.702616 mp_res: 0.000000 coords: 1 1 0 0
t:3 res: 13.107516 mp_res: 0.000000 coords: 1 1 0 0
t:0 res: 45.302721 mp_res: 95.768892 coords: 1 0 0 0
t:1 res: 469.846250 mp_res: 1001.815014 coords: 1 0 0 0
t:2 res: 54.647303 mp_res: 129.349919 coords: 1 0 0 0
t:3 res: 4.973098 mp_res: 18.080614 coords: 1 0 0 0
t:0 res: 0.732196 mp_res: 1.952146 coords: 0 0 0 0
t:1 res: 0.180954 mp_res: 0.409872 coords: 0 0 0 0
t:2 res: 0.899016 mp_res: 1.452417 coords: 0 0 0 0
t:3 res: 7.200939 mp_res: 12.803724 coords: 0 0 0 0

it seems like the processes at 1 1 and 0 1 end up with vanishing Cpp[t] ? The processes 1 0 and 0 0 seem to end up with correct value after reduction. I also checked for 1 dim parallelization and the local value matches the mpi reduced value, as expected.

Now after the gather operation (-1 means "outside of the local timeslice"):

t:0 sCpp[t]: 2.985231 Cpp[t]: 0.000000 coords: 1 0 0 0
t:1 sCpp[t]: 31.227775 Cpp[t]: 0.000000 coords: 1 0 0 0
t:2 sCpp[t]: 4.031992 Cpp[t]: 0.000000 coords: 1 0 0 0
t:3 sCpp[t]: 0.563594 Cpp[t]: 0.000000 coords: 1 0 0 0
t:4 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:5 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:6 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:7 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:0 sCpp[t]: 0.060851 Cpp[t]: 0.060851 coords: 0 0 0 0
t:1 sCpp[t]: 0.012776 Cpp[t]: 0.012776 coords: 0 0 0 0
t:2 sCpp[t]: 0.045274 Cpp[t]: 0.045274 coords: 0 0 0 0
t:3 sCpp[t]: 0.399107 Cpp[t]: 0.399107 coords: 0 0 0 0
t:4 sCpp[t]: -1.000000 Cpp[t]: 2.985231 coords: 0 0 0 0
t:5 sCpp[t]: -1.000000 Cpp[t]: 31.227775 coords: 0 0 0 0
t:6 sCpp[t]: -1.000000 Cpp[t]: 4.031992 coords: 0 0 0 0
t:7 sCpp[t]: -1.000000 Cpp[t]: 0.563594 coords: 0 0 0 0

it certainly seems like g_cart_id=0 has all the correct information gathered together... Perhaps the problem is not with the communicators after all?

kostrzewa commented 11 years ago

For 1D parallelization everything seems to work fine too:

t:0 res: 1.119806 mp_res: 1.119806 coords: 0 0 0 0
t:1 res: 0.456280 mp_res: 0.456280 coords: 0 0 0 0
t:0 res: 2.067068 mp_res: 2.067068 coords: 1 0 0 0
t:1 res: 20.173998 mp_res: 20.173998 coords: 1 0 0 0
t:0 res: 101.452179 mp_res: 101.452179 coords: 2 0 0 0
t:1 res: 918.865188 mp_res: 918.865188 coords: 2 0 0 0
t:0 res: 116.694084 mp_res: 116.694084 coords: 3 0 0 0
t:1 res: 12.207416 mp_res: 12.207416 coords: 3 0 0 0
t:0 sCpp[t]: 0.034906 Cpp[t]: 0.034906 coords: 0 0 0 0
t:1 sCpp[t]: 0.014223 Cpp[t]: 0.014223 coords: 0 0 0 0
t:2 sCpp[t]: -1.000000 Cpp[t]: 0.064433 coords: 0 0 0 0
t:3 sCpp[t]: -1.000000 Cpp[t]: 0.628848 coords: 0 0 0 0
t:4 sCpp[t]: -1.000000 Cpp[t]: 3.162386 coords: 0 0 0 0
t:5 sCpp[t]: -1.000000 Cpp[t]: 28.642130 coords: 0 0 0 0
t:6 sCpp[t]: -1.000000 Cpp[t]: 3.637495 coords: 0 0 0 0
t:7 sCpp[t]: -1.000000 Cpp[t]: 0.380520 coords: 0 0 0 0
t:0 sCpp[t]: 0.064433 Cpp[t]: 0.000000 coords: 1 0 0 0
t:1 sCpp[t]: 0.628848 Cpp[t]: 0.000000 coords: 1 0 0 0
t:2 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:3 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:4 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:5 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:6 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:7 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 1 0 0 0
t:0 sCpp[t]: 3.162386 Cpp[t]: 0.000000 coords: 2 0 0 0
t:1 sCpp[t]: 28.642130 Cpp[t]: 0.000000 coords: 2 0 0 0
t:2 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 2 0 0 0
t:3 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 2 0 0 0
t:4 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 2 0 0 0
t:5 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 2 0 0 0
t:6 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 2 0 0 0
t:7 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 2 0 0 0
t:0 sCpp[t]: 3.637495 Cpp[t]: 0.000000 coords: 3 0 0 0
t:1 sCpp[t]: 0.380520 Cpp[t]: 0.000000 coords: 3 0 0 0
t:2 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 3 0 0 0
t:3 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 3 0 0 0
t:4 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 3 0 0 0
t:5 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 3 0 0 0
t:6 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 3 0 0 0
t:7 sCpp[t]: -1.000000 Cpp[t]: 0.000000 coords: 3 0 0 0

kostrzewa commented 11 years ago

From what I understand I cannot really comprehend why all processes get the same mpi_time_rank. In MPI_comm_split they all have different "colors" (g_proc_coords[0]).

I also believe I now understand why this is the case and why this should be so.

deuzeman commented 11 years ago

Are we sure that the node with g_cart_id = 0 will always end up having rank 0 in one of the sub-communicators?

deuzeman commented 11 years ago

Just found the answer: since key is set to g_cart_id in the splitting, g_cart_id = 0 will always be the 0th process of some subgroup.

deuzeman commented 11 years ago

It seems there's no subtle issue with the rank assignments. I replaced the MPI_Reduce and MPI_Gather calls by MPI_Allreduce and MPI_Allgather. That should take care of any rank assignment weirdness, but it doesn't matter. In fact, it doesn't change the (parallelization dependent) value of the correlator whatsoever.

kostrzewa commented 11 years ago

Actually, the correlator must be parallelization dependent... I just remembered that the RNG is reset in source_generation... It shouldn't have an effect on the variance though, unless in the case of MPI the Z4 wall-source is of better quality because we're dealing with many different RNGs?

kostrzewa commented 11 years ago

Actually, the correlator must be parallelization dependent... I just remembered that the RNG is reset in source_generation... It shouldn't have an effect on the variance though, unless in the case of MPI the Z4 wall-source is of better quality because we're dealing with many different RNGs?

Speaking of which, that's wrong, isn't it? All those RNGs should be started with the same seed as they are effectively in "reproducerandomnumbers" mode. All RNGs generate the same amount of random numbers but they are only used when the current global coordinate is on the node.

deuzeman commented 11 years ago

All RNGs generate the same amount of random numbers but they are only used when the current global coordinate is on the node.

I was just thinking the same thing. If we don't want that behavior in general, we should at least implement it for testing purposes...

kostrzewa commented 11 years ago

There we go, I'm glad that's dealt with :) :

16:37 bartek@artemis ~/code/tmLQCD.kost/build_mpi (etmcmaster|✚4…) $ diff -y -W 100 onlinemeas.000000.mpi_4_1D_repro onlinemeas.000000.mpi_2_1D_repro
1  1  0  3.461198e+01  0.000000e+00     1  1  0  3.461198e+01  0.000000e+00
1  1  1  2.365384e+00  2.891736e+00     1  1  1  2.365384e+00  2.891736e+00
1  1  2  3.562455e-01  4.134450e-01     1  1  2  3.562455e-01  4.134450e-01
1  1  3  6.168106e-02  4.594548e-02     1  1  3  6.168106e-02  4.594548e-02
1  1  4  1.191513e-02  0.000000e+00     1  1  4  1.191513e-02  0.000000e+00
2  1  0  1.431370e+00  0.000000e+00     2  1  0  1.431370e+00  0.000000e+00
2  1  1  1.462085e+00  -1.246998e+00        2  1  1  1.462085e+00  -1.246998e+00
2  1  2  1.237051e-01  -2.307649e-01        2  1  2  1.237051e-01  -2.307649e-01
2  1  3  5.317663e-03  -8.341291e-03        2  1  3  5.317663e-03  -8.341291e-03
2  1  4  8.496635e-03  0.000000e+00     2  1  4  8.496635e-03  0.000000e+00
6  1  0  4.289269e+00  0.000000e+00     6  1  0  4.289269e+00  0.000000e+00
6  1  1  5.386116e-01  -1.821203e+00        6  1  1  5.386116e-01  -1.821203e+00
6  1  2  1.647572e-01  -2.876740e-01        6  1  2  1.647572e-01  -2.876740e-01
6  1  3  4.147460e-02  -2.877868e-02        6  1  3  4.147460e-02  -2.877868e-02
6  1  4  1.312262e-03  0.000000e+00     6  1  4  1.312262e-03  0.000000e+00

kostrzewa commented 11 years ago

I was just thinking the same thing. If we don't want that behavior in general, we should at least implement it for testing purposes...

we have that already, except that source_generation uses it's own "repro" mode, but the seeds were initialized wrongly! I just think that the seed used for this purpose should also depend on the seed in the input file, so that from run to run we avoid using the same random numbers for the measurements in particular (which is the case right now) and source generation in general as this will have implications for correlated fits which use samples from various ensembles.

deuzeman commented 11 years ago

we have that already, except that source_generation uses it's own "repro" mode, but the seeds were initialized wrongly!

Yes, I formulated that badly -- I meant we should just use a single seed on all nodes at least for now, even if for some obscure reason the different seeds had to be there.

deuzeman commented 11 years ago

There we go, I'm glad that's dealt with :) :

Awesome! It shouldn't really matter for the variance, as you said. But random numbers do weird things, as we've seen. Perhaps it's worth checking again?

kostrzewa commented 11 years ago

Yes, I formulated that badly -- I meant we should just use a single seed on all nodes at least for now, even if for some obscure reason the different seeds had to be there.

I don't think there is a reason, I think it's a simple oversight.

kostrzewa commented 11 years ago

Awesome! It shouldn't really matter for the variance, as you said. But random numbers do weird things, as we've seen. Perhaps it's worth checking again?

Yes, will do, absolutely.

kostrzewa commented 11 years ago

Btw, all the source generators are affected in this way except for the nucleon one and the extended pion source (the former doesn't use g_cart_id in the seed computation while the latter doesn't use random numbers)

kostrzewa commented 11 years ago

And as a final note, serial and MPI agree to now. I will fix this tomorrow.

etmc / tmLQCD

another variance problem due to parallelization! #256