QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
298 stars 139 forks source link

Incorrect ion-ion energy in some periodic systems #2105

Closed prckent closed 2 years ago

prckent commented 4 years ago

This is an important bug that exists in QMCPACK dating to at least mid-2015, likely earlier.

First noticed for strained phosphene and also reproduced in A-A stacked graphene, for some configurations the ion-ion coulomb energy is incorrectly computed in these periodic systems. The errors can be in eV. To date dense solids appear to be correct - and these are already checked nightly. We are investigating the exact cases when the problem occurs and will update this issue. The bug could be in "plumbing", implementation, or simply unexpected behavior of the implemented method (Natoli/Ceperley).

The ion-ion term should of course be exactly the same as reported by the code that generated the trial wavefunction e.g. Quantum Espresso. For the time being we recommend making cross-checks.

prckent commented 4 years ago

Initial report with files https://groups.google.com/forum/#!topic/qmcpack/3Hkt5sWY_e4

prckent commented 4 years ago

Fast to run tests were added for this bug in #2116 - bug-deterministic-grapheneC_1x1_pp-vmc_sdj_z10-1-1-ionion and bug-deterministic-grapheneC_1x1_pp-vmc_sdj_z30-1-1-ionion . Both fail. The reference value is currently the QE value.

rcclay commented 4 years ago

I'm just going to use this as the dumping ground for ongoing tests: Test 1: Convergence of Ewald ion-ion Interaction in QMCPACK vs. QE:

Ewald in QE converges really quickly. I'm doing config #3 from the forum--QE is initialized for a 100 atom cell corresponding to the 5x5x1 tiling... this is to keep the comparison between the supercell QMCPACK calculation and QE's ewald summation apples-to-apples.

Ecut (Ry) | Ewald Energy (Ry) | Ewald Energy (eV/cell) 5 | 3497.85849399 | 1903.63148311 10 | 3497.85849326 | 1903.63148271 20 | 3497.85849589 | 1903.63148414 40 | 3497.85849589 | 1903.63148414

This is in contrast to QMCPACK's Ewald3D and Optimized Breakup handlers:

r_c*k_c | Ecut (Ry) | Ewald 3D (eV/cell) | OptBreakup (eV/cell) 15 | 0.904562182 | 1903.61535656 | 1903.71679048 30 | 3.618248729 | 1903.79687315 | 1903.48366227 60 | 14.47299492 | 1903.84280423 | 1903.42547671 120 | 57.89197967 | 1903.86393463 | CRASH

To get Ecut(Ry), I used k_c^2/2 (energy cutoff in Hartree) and converted to Rydberg (x2).

rcclay commented 4 years ago

Test 2: Comparison of Short Range, Long Range, and Background Ewald Terms between QE and QMCPACK

So I think the bug is in the computation of the long-range piece of the potential only. Here's the evidence to back this up.

I hacked QE to spit out the following.

ecutwfc (Ry) k_c (1/bohr) lr_dim_cutoff Sigma (1/bohr) E_TOT (Ha) E_SR (Ha) E_LR (Ha) E_BG (Ha)
5 4.47203656755 70.53 0.31622776602 1748.92924699333 57.92548014569 2215.37865766054 -524.37489081291
10 6.32441608475 99.75 0.54772255751 1748.92924662991 0.93006487691 2546.66220639094 -798.66302463795
20 8.94426075853 141.06 0.83666002653 1748.92924794664 0.00049399514 2940.20791405832 -1191.27916010682
40 12.64906945652 199.49 1.22474487139 1748.92924794570 0.00000000000 3481.62291942069 -1732.69367147499

Here are the columns:

  1. ecutwfc in Ry. Ewald summation is done on the density grid, so the energy cutoff will be 4x what's listed here.
  2. k_c: maximum G-vector magnitude used in Ewald summation.
  3. lr_dim_cutoff: the value of lr_dim_cutoff in QMCPACK required to setup the QE equivalent G vector grid.
  4. Sigma: This is the Ewald breakup parameter. Using the convention that the real space term is erfc(Sigma*r)/r.
  5. E_TOT: Total ion-ion Ewald energy for the 5x5x1 supercell in Ha.
  6. E_SR: The contribution to the Ewald energy from just the short-range piece.
  7. E_LR: The contribution to the Ewald energy from just the long-range piece.
  8. E_BG: The contribution to the Ewald energy from just the background/constant terms.

For the QMCPACK comparison, I looked at just the first row in the above table. I set lr_dim_cutoff to what's listed in the first row, and I hard-coded the Sigma in the EwaldHandler3D to be that given in the table. Note that QMCPACK chooses this differently from QE, hence why this was necessary.

Running this in the same geometry as the QE run, I get the following for the energy: E_TOT = 1749.1148682 E_SR = 57.9254710988 E_LR = 2215.56428788 <------Big discrepancy from QE.
E_BG = -524.374890799

To make this explicit, I'm showing the difference between QMCPACK and QE for the various terms:

ΔE_TOT 0.18562120667
ΔE_SR -0.00000904689
ΔE_LR 0.18563021946
ΔE_BG 0.00000001391

Basically, I suspect the bug is just in the long-range computation of the coulomb potential, and probably not in the actual breakup procedure. Could be the G-vector initialization/manipulation, which wouldn't surprise me given PR #2087. Could be improper initialization/filling of the v_k (long range piece of coulomb potential). Maybe S(k) is not computed correctly, but I doubt this.

rcclay commented 4 years ago

I think I found the bug. In these large cells, there's a misidentification k-vectors with k-shells. To save time, the ewald summation is done over k-shells instead of looping over the individual K-points. So the long-range piece in Fourier space v_lr(k) is stored in a kshell-based array corresponding to v_lr(|k|) (If you go digging in the code, this is Fk_symm[]).

What I'm doing is taking the list of k-vectors given in KList.kpts_cart, and I'm comparing the stored value of v_lr(k) (this is Fk[] in the code, which in turn is set from Fk_symm[] since hey, all the Fk values in the same shell should be identical) to the one I would compute directly using v_lr(k) given by Ewald for that k-vector. Here's what I get.

i kx ky kz \ k\ ^2 Fk Fk_ref dFk
0 0 -0.149436217 0 0.022331183 0.010616893 0.010616893 0
1 0 0.149436217 0 0.022331183 0.010616893 0.010616893 0
2 0 0 -0.166245923 0.027637707 0.008465365 0.008465365 0
3 0 0 0.166245923 0.027637707 0.008465365 0.008465365 0
4 -0.199194722 0 0 0.039678537 0.005721618 0.005721618 0
5 0.199194722 0 0 0.039678537 0.005721618 0.005721618 0
6 0 -0.149436217 -0.166245923 0.04996889 0.004427945 0.004427945 0
7 0 -0.149436217 0.166245923 0.04996889 0.004427945 0.004427945 0
8 0 0.149436217 -0.166245923 0.04996889 0.004427945 0.004427945 0
9 0 0.149436217 0.166245923 0.04996889 0.004427945 0.004427945 0
10 -0.199194722 -0.149436217 0 0.06200972 0.003462334 0.003462334 0
11 -0.199194722 0.149436217 0 0.06200972 0.003462334 0.003462334 0
12 0.199194722 -0.149436217 0 0.06200972 0.003462334 0.003462334 0
13 0.199194722 0.149436217 0 0.06200972 0.003462334 0.003462334 0
14 -0.199194722 0 -0.166245923 0.067316244 0.003147367 0.003147367 0
15 -0.199194722 0 0.166245923 0.067316244 0.003147367 0.003147367 0
16 0.199194722 0 -0.166245923 0.067316244 0.003147367 0.003147367 0
17 0.199194722 0 0.166245923 0.067316244 0.003147367 0.003147367 0
18 -0.199194722 -0.149436217 -0.166245923 0.089647427 0.002235032 0.002235032 0
19 -0.199194722 -0.149436217 0.166245923 0.089647427 0.002235032 0.002235032 0
20 -0.199194722 0.149436217 -0.166245923 0.089647427 0.002235032 0.002235032 0
21 -0.199194722 0.149436217 0.166245923 0.089647427 0.002235032 0.002235032 0
22 0 -0.298872433 0 0.089324731 0.002235032 0.002244916 -9.88E-06
23 0 0.298872433 0 0.089324731 0.002235032 0.002244916 -9.88E-06

Here's what the columns are

  1. i : kvector index.
  2. kx from KList.kpts_cart[i]
  3. ky from KList.kpts_cart[i]
  4. kz from KList.kpts_cart[i]
  5. |k|^2 for this kvector.
  6. Fk : v_lr(k) computed from v_lr(|k|) corresponding to the shell.
  7. Fk_ref : v_lr(k) computed directly for this kpoint without any symmetry tricks.
  8. dFk : The difference between the two.

Oh snap. #22 and #23 are noticeably different. I cut the list off instead of showing the results for all 75000 k-vectors, but there are blips in the list every 80-100 kvectors on average with tapering off magnitudes. So let's take a look at those k-vectors around #22 and #23.

ks i kx ky kz \ k\ ^2
0 0 0 -0.149436217 0 0.022331183
0 1 0 0.149436217 0 0.022331183
1 2 0 0 -0.166245923 0.027637707
1 3 0 0 0.166245923 0.027637707
2 4 -0.199194722 0 0 0.039678537
2 5 0.199194722 0 0 0.039678537
3 6 0 -0.149436217 -0.166245923 0.04996889
3 7 0 -0.149436217 0.166245923 0.04996889
3 8 0 0.149436217 -0.166245923 0.04996889
3 9 0 0.149436217 0.166245923 0.04996889
4 10 -0.199194722 -0.149436217 0 0.06200972
4 11 -0.199194722 0.149436217 0 0.06200972
4 12 0.199194722 -0.149436217 0 0.06200972
4 13 0.199194722 0.149436217 0 0.06200972
5 14 -0.199194722 0 -0.166245923 0.067316244
5 15 -0.199194722 0 0.166245923 0.067316244
5 16 0.199194722 0 -0.166245923 0.067316244
5 17 0.199194722 0 0.166245923 0.067316244
6 18 -0.199194722 -0.149436217 -0.166245923 0.089647427
6 19 -0.199194722 -0.149436217 0.166245923 0.089647427
6 20 -0.199194722 0.149436217 -0.166245923 0.089647427
6 21 -0.199194722 0.149436217 0.166245923 0.089647427
6 22 0 -0.298872433 0 0.089324731
6 23 0 0.298872433 0 0.089324731
6 24 0.199194722 -0.149436217 -0.166245923 0.089647427
6 25 0.199194722 -0.149436217 0.166245923 0.089647427
6 26 0.199194722 0.149436217 -0.166245923 0.089647427
6 27 0.199194722 0.149436217 0.166245923 0.089647427
  1. ks : The k-shell index. All kpoints with the same ks have the same v_lr(k) the way the code is currently constructed.
  2. i : kpoint index (KList.kpts_cart[i]).
  3. kx Same as before
  4. ky
  5. kz
  6. |k|^2 Self explanatory.

Looking at i=22 and 23, we see that these k-vectors are misidentified as belonging to kshell #6 despite the fact that their lengths are very slightly off their neighbors.

rcclay commented 4 years ago

Confirmed the bug. The tolerance is effectively set in src/LongRange/KContainer.cpp line #256 so that kvectors differing by less than 0.001 in |k|^2 are treated as being in the same k-shell (Also a comment on line #253 to this effect). Increasing the effective tolerance to 1e-7 by jacking up that integer multiplication gives us the following values for the ion-ion interaction computed by the EwaldHandler3D routine:

E_TOT 1748.92913
E_SR 57.9254711
E_LR 2215.37855
E_BG -524.37489

Now the differences from QE's Ewald:

ΔE_TOT -0.00011279333
ΔE_SR -0.00000904689
ΔE_LR -0.00010375054
ΔE_BG 0.00000001391

Boom.

prckent commented 4 years ago

Now that #2125 is merged, the development version will hard abort if the computed ion-ion energy differs significantly from an independent reference Ewald computation performed to a specific energy tolerance. The graphene z10 and z30 tests currently both correctly fail at this check.

jtkrogel commented 4 years ago

I would feel a lot more comfortable with this if we were casting to something with a larger range than int:

int k_ind = static_cast<int>(ksq_tmp[ik] * 1000);

I don't know what range of values we are guaranteed that k^2 will fall in for all cells, but using an integer type with larger range should allow the safe use of a larger precision multiplier (1000 here) without loss of precision or risk of overflow.

ye-luo commented 4 years ago

Now the differences from QE's Ewald:

ΔE_TOT -0.00011279333 ΔE_SR -0.00000904689 ΔE_LR -0.00010375054 ΔE_BG 0.00000001391 Boom.

Could you provide delta with respect to the reference implementation added by Jaron? I still feel LR is larger than I would expect.

prckent commented 4 years ago

To summarize the situation: a bug was found in the Optimized Breakup implementation and fixed (#2137), but both this implementation and reference Ewald are also found to have slow convergence for anisotropic systems. A reference Ewald check on the ion-ion term has been added to the production code to catch this condition. Tests have also been added that currently fail due to very slow convergence of the underlying algorithm. A better algorithm and implementation is needed to (1) deliver good accuracy and (2) do so fast enough. #2185 is one possibility.

prckent commented 4 years ago

Improvements will be investigated in #2185 . Closing this since bugs have been fixed and safeguards introduced.

jtkrogel commented 3 years ago

Users are still getting bitten by this in practice quite often (perhaps each and every one who tries a quasi-2D system).

I recommend we detect quasi-2D input (easy) and switch the default to lr_handler=ewald and lr_dim_cutoff=25 for that case. Alternately, switch the default to this generally, and let solid state user's specify optimized breakup when they knowingly assert to its safety as a performance optimization.

kryczko commented 2 years ago

Hi guys, I am also hitting this bug for a periodic system of 32 water molecules. Here is my input for a jastrow opt:

<simulation>
   <project id="opt" series="0">
      <application name="qmcpack" role="molecu" class="serial" version="1.0"/>
   </project>
   <qmcsystem>
      <simulationcell>
         <parameter name="lattice" units="bohr">
                 19.09677861        0.00000000        0.00000000
                  0.00000000       19.09677861        0.00000000
                  0.00000000        0.00000000       19.09677861
         </parameter>
         <parameter name="bconds">
            p p p
         </parameter>
         <parameter name="LR_dim_cutoff"       >    35.0               </parameter>
      </simulationcell>
      <particleset name="e" random="yes">
         <group name="u" size="128" mass="1.0">
            <parameter name="charge"              >    -1                    </parameter>
            <parameter name="mass"                >    1.0                   </parameter>
         </group>
         <group name="d" size="128" mass="1.0">
            <parameter name="charge"              >    -1                    </parameter>
            <parameter name="mass"                >    1.0                   </parameter>
         </group>
      </particleset>
      <particleset name="ion0">
         <group name="O" size="32" mass="29164.39286783587">
            <parameter name="charge"              >    6                     </parameter>
            <parameter name="valence"             >    6                     </parameter>
            <parameter name="atomicnumber"        >    8                     </parameter>
            <parameter name="mass"                >    29164.39286783587            </parameter>
            <attrib name="position" datatype="posArray" condition="0">
                    24.26616224       34.23786910      -19.88728885
                    32.99764184        4.89042226        1.78485767
                   -15.27579016        6.65758075       38.82499031
                    22.42481310       11.92337821       -7.60669570
                    10.09818623      -36.83076232       19.09171414
                   -11.23287228       13.50815924       31.73549375
                    10.92768151       24.86633926        5.76944727
                    -9.78708061      -29.22574847      -17.44821933
                    -6.13987138      -55.38050301       14.01883883
                    29.88923132      -31.53574970       35.33598895
                     1.56789254       17.86637793      -32.73459196
                    34.80913331        3.65144222        9.38999246
                    31.77272136       32.05504644        0.27278764
                   -30.27341264      -39.32009856       23.29068561
                    35.83412076       27.63459907       -1.26540786
                   -10.87943680       19.57945246       13.17998940
                   -49.64178269      -16.61639968      -11.19560688
                   -18.60134911       26.58693490       31.83659410
                    -7.11236224       -4.03040790      -14.07090078
                    20.17679489       -3.46141136       -4.44342644
                     4.56091071       11.36220514      -16.19497185
                     1.01714887       38.91116182       -8.61870074
                     6.32924084       24.01974195       15.37836450
                    16.24028749       10.66064431       42.29377160
                    -4.79620051        8.82201637       27.72832949
                    14.76737825       16.42141774       -3.33612251
                    12.32262065       48.53912749       13.07764183
                    32.35702468       53.07692686       10.72873114
                     1.07544125       32.48798270       26.25604386
                    23.46321761       24.58760466       10.48820680
                    -0.36077140        1.66652869       38.94706662
                   -23.26460739       -0.01549386        5.84212613
            </attrib>
         </group>
         <group name="H" size="64" mass="1837.3622193391143">
            <parameter name="charge"              >    1                     </parameter>
            <parameter name="valence"             >    1                     </parameter>
            <parameter name="atomicnumber"        >    1                     </parameter>
            <parameter name="mass"                >    1837.3622193391143            </parameter>
            <attrib name="position" datatype="posArray" condition="0">
                    23.12061026       34.20139738      -21.54098818
                    25.71709396       35.41932587      -20.30737496
                    32.34058406        4.98169603        3.41910039
                    34.23881396        3.52021963        1.85762346
                   -16.89611694        7.01972787       37.90563855
                   -15.44556316        5.09009072       39.54044062
                    21.69575675       12.50539496       -9.29422114
                    23.01100614       10.18061608       -8.16520426
                     9.64824243      -36.16879126       17.52365719
                    11.76927104      -35.90555241       19.85176199
                   -12.91138372       13.13055416       31.33014750
                   -10.51838573       15.20947967       31.45713709
                    11.74992025       26.05252035        6.83177571
                    10.06245150       23.90673633        7.06776471
                   -11.38983293      -28.87482633      -16.90048221
                    -9.53954539      -30.85828288      -16.37814301
                    -5.28280499      -56.90059871       14.29589158
                    -6.34785464      -54.53220495       15.61287951
                    30.43498423      -30.47429053       34.19138183
                    28.79111146      -30.62452376       36.34283503
                     3.13093605       17.57864822      -33.46648289
                     1.80647236       18.35957755      -30.90609295
                    33.81438147        3.51511737       10.72884453
                    36.42957346        2.91758597       10.24448882
                    30.42780327       30.93406090        0.87568775
                    32.79128374       30.78533945       -1.03389184
                   -29.69458953      -38.26714316       21.99376657
                   -28.86462181      -40.40102190       23.51121665
                    36.71076471       27.30559775       -2.81622106
                    34.26640396       26.73301073       -1.26900023
                   -11.52912465       20.45741922       14.45016882
                    -9.10093995       20.61124293       13.03026640
                   -49.34339494      -18.22820379      -12.45131100
                   -49.56260317      -17.21553735       -9.34201231
                   -17.19834084       26.24035913       31.22659050
                   -18.92504030       28.31660123       31.41764182
                    -7.14036799       -3.82826389      -15.73108188
                    -5.62450527       -2.61343455      -13.47412527
                    21.17457029       -4.96871361       -5.39550826
                    18.32229325       -3.88765798       -4.78556136
                     4.79809024       12.59595064      -17.41898526
                     4.16523985       10.11729136      -17.19852981
                     2.05683461       40.37683341       -8.69740783
                     0.68774693       38.51526420       -6.82102317
                     5.61021894       24.83648159       16.74378612
                     8.04554680       24.11120470       14.94801717
                    16.46450349        9.58389726       40.56712883
                    15.21996765       11.84690099       41.50821245
                    -4.28433040        9.40845508       26.07368529
                    -3.68044951        7.39886362       27.87667299
                    13.85105004       15.06287472       -2.04540177
                    14.57808438       15.47487281       -5.04968838
                    10.70881343       49.55504426       12.80342367
                    12.59744352       47.50941572       11.51291080
                    32.33170235       51.27639580       11.02301820
                    32.17825658       53.42766003        8.99734516
                     0.52424026       31.64270820       24.77015220
                     0.82794193       34.25468766       25.37769915
                    24.21797422       23.99064017        8.99006972
                    24.04997757       24.10648038       11.99720981
                    -0.60767545        0.82025264       37.32228009
                     0.85953248        0.69664754       40.03347017
                   -22.87267819        0.93691110        7.30758985
                   -21.48731996        0.09036670        5.18794074
            </attrib>
         </group>
      </particleset>
      <wavefunction name="psi0" target="e">
         <sposet_builder type="bspline" href="../scf/pwscf_output/pwscf.pwscf.h5" tilematrix="1 0 0 0 1 0 0 0 1" twistnum="0" source="ion0" version="0.10" meshfactor="1.2" precision="float" truncate="no">
            <sposet type="bspline" name="spo_ud" size="128" spindataset="0"/>
         </sposet_builder>
         <determinantset>
            <slaterdeterminant>
               <determinant id="updet" group="u" sposet="spo_ud" size="128"/>
               <determinant id="downdet" group="d" sposet="spo_ud" size="128"/>
            </slaterdeterminant>
         </determinantset>
         <jastrow type="One-Body" name="J1" function="bspline" source="ion0" print="yes">
            <correlation elementType="O" size="20" rcut="9.548389305" cusp="0.0">
               <coefficients id="eO" type="Array">                  
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
               </coefficients>
            </correlation>
            <correlation elementType="H" size="20" rcut="9.548389305" cusp="0.0">
               <coefficients id="eH" type="Array">                  
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
               </coefficients>
            </correlation>
         </jastrow>
         <jastrow type="Two-Body" name="J2" function="bspline" print="yes">
            <correlation speciesA="u" speciesB="u" size="20" rcut="9.548389305">
               <coefficients id="uu" type="Array">                  
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
               </coefficients>
            </correlation>
            <correlation speciesA="u" speciesB="d" size="20" rcut="9.548389305">
               <coefficients id="ud" type="Array">                  
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
               </coefficients>
            </correlation>
         </jastrow>
      </wavefunction>
      <hamiltonian name="h0" type="generic" target="e">
         <pairpot type="coulomb" name="ElecElec" source="e" target="e"/>
         <pairpot type="coulomb" name="IonIon" source="ion0" target="ion0"/>
         <pairpot type="pseudo" name="PseudoPot" source="ion0" wavefunction="psi0" format="xml">
            <pseudo elementType="O" href="O.BFD.xml"/>
            <pseudo elementType="H" href="H.BFD.xml"/>
         </pairpot>
         <pairpot type="MPC" name="MPC" source="e" target="e" ecut="60.0" physical="no"/>
         <estimator name="KEcorr" type="chiesa" source="e" psi="psi0"/>
      </hamiltonian>
   </qmcsystem>
   <loop max="6">
      <qmc method="linear" move="pbyp" checkpoint="-1">
         <cost name="energy"              >    0.0                </cost>
         <cost name="unreweightedvariance">    1.0                </cost>
         <cost name="reweightedvariance"  >    0.0                </cost>
         <parameter name="warmupSteps"         >    300                </parameter>
         <parameter name="blocks"              >    100                </parameter>
         <parameter name="steps"               >    1                  </parameter>
         <parameter name="subSteps"            >    10                 </parameter>
         <parameter name="timestep"            >    0.3                </parameter>
         <parameter name="useDrift"            >    no                 </parameter>
         <parameter name="samples"             >    51200              </parameter>
         <parameter name="MinMethod"           >    quartic            </parameter>
         <parameter name="minwalkers"          >    0.3                </parameter>
         <parameter name="nonlocalpp"          >    yes                </parameter>
         <parameter name="use_nonlocalpp_deriv">    yes                </parameter>
         <parameter name="useBuffer"           >    yes                </parameter>
         <parameter name="alloweddifference"   >    0.0001             </parameter>
         <parameter name="exp0"                >    -6                 </parameter>
         <parameter name="bigchange"           >    10.0               </parameter>
         <parameter name="stepsize"            >    0.15               </parameter>
         <parameter name="nstabilizers"        >    1                  </parameter>
      </qmc>
   </loop>
</simulation>

and the error is:

ERROR in ion-ion Ewald energy exceeds 0.0003 Ha/atom tolerance.

  Reference ion-ion energy: -220.2033854686
  QMCPACK   ion-ion energy: -210.25210784674
            ion-ion diff  : 9.9512776218642
            diff/atom     : 0.10365914189442
            tolerance     : 0.0003

Please try increasing the LR_dim_cutoff parameter in the <simulationcell/>
input.  Alternatively, the tolerance can be increased by setting the
LR_tol parameter in <simulationcell/> to a value greater than 0.0003. 
If you increase the tolerance, please perform careful checks of energy
differences to ensure this error is controlled for your application.

Fatal Error. Aborting at Unhandled Exception

Please @jtkrogel let me know what I can do to correct this!!

jtkrogel commented 2 years ago

Interesting. This is the first time the issue has been seen in a dense system in PBC's.

You should be able to fix this by switching to:

<simulationcell>
  ...
  <parameter name="LR_handler">  ewald  </parameter>
  <parameter name="LR_dim_cutoff">  35  </parameter>
</simulationcell>

This will use an alternative method to do the Ewald sum (safe but more expensive). You can access this via Nexus by setting lr_handler='ewald', in generate_qmcpack.

jtkrogel commented 2 years ago

@kryczko see above.

kryczko commented 2 years ago

right on! I'll give that a go and post back. Thanks so much @jtkrogel !!

jtkrogel commented 2 years ago

I'm reopening this issue. To other developers: this provides yet another datapoint supporting a switch of the default algorithm to the direct Ewald implementation. In the event of sub-optimal performance, the focus should be on speeding up this implementation.

Separately I will note that no effort was made to improve the robustness of the Natoli-Ceperley implementation. It could be as simple as enforcing a tolerance on the convergence of the SVD singlular values used to arrive at a compact representation of the real space basis.

ye-luo commented 2 years ago

opt_breakup and eward gave close number. EwardRef was broken this time.

kryczko commented 2 years ago

I tried running with the parameters given by @jtkrogel above, but still see same error. Please let me know if there is something else I am missing.

ye-luo commented 2 years ago

@kryczko #3763 will resolve your issue.

kryczko commented 2 years ago

gotcha -- just have to wrap the coordinates.