QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
292 stars 135 forks source link

Two-body Jastrow initialization not robust to input change #282

Open Paul-St-Young opened 7 years ago

Paul-St-Young commented 7 years ago

Currently, the TwoBodyJastrowOrbital class is initialized when addFunc is first called. During initialization, the correlation functions between all pairs of species ('uu','ud','du','dd',etc.) are set to the same function, namely that defined in the first <correlation> node. This initialization procedure assumes a very specific structure of the input file and can be easily broken without warning.

  1. If one defines the 'ud' Jastrow before the 'uu' Jastrow, then the 'dd' Jastrow will be linked to the 'ud' Jastrow instead of the 'uu' Jastrow.
  2. If one defines the 'ud' Jastrow without the 'uu' Jastrow, then the 'uu' and 'dd' Jastrows will be linked to 'ud' without specification.

I swapped the order 'uu' and 'ud' Jastrows in the system test 'short-diamondC_1x1x1_pp-vmc_sdj-1-16'. Variance increased by 16%, presumably because the 'ud' Jastrow has the wrong cusp for 'uu'.

'uu' before 'ud' qmca -q ev --sac qmc_short.s000.scalar.dat LocalEnergy Variance ratio qmc_short series 0 -10.491524 +/- 0.000489 1.1 0.386688 +/- 0.003372 1.0 0.0369

'ud' before 'uu' qmca -q ev --sac qmc_short.s000.scalar.dat LocalEnergy Variance ratio qmc_short series 0 -10.486313 +/- 0.000501 1.0 0.448020 +/- 0.003762 1.0 0.0427

Paul-St-Young commented 7 years ago

I think the best solution is to:

  1. Initialize all Jastrow functions to the constant function J(r) = 0.0 in the constructor of TwoBodyJastrowOrbital
  2. Explicitly link 'dd' to 'uu' in the builder BsplineJastrowBuilder
jtkrogel commented 7 years ago

Right now the input reader simply treats the first correlation as 'uu' no matter what, and the second, if present, is treated as 'ud'.

Also at issue here is how strongly default assumptions about symmetry constraints are treated. Right now uu==dd and ud==du are rigidly enforced (you can't even make a dd term). We should probably make this more explicit and give the user more control.

I propose we make a "symmetry" attribute to <jastrow type="Two-Body", with default value symmetry="uu=dd,ud=du,uu=ud,uu=du". If symmetry=="none", then only terms provided are considered (e.g. if "ud" is provided, only "ud" will be present with uu/dd/du set to zero). With the default value the symmetries are also loosened given additional user input: if "dd" is explicitly given, the suggested uu=dd symmetry is not enforced.

The symmetries only need be applied after all correlation terms have been provided (i.e. addFunc is not the right place to enforce the symmetries). It would really be pretty simple, similar to what Paul is proposing above:

  1. Initialize all correlation functions (U) to zero (constructor is probably fine).
  2. Read in the new "symmetry" attribute during "put"
  3. Change "addFunc" to only add the single correlation function it is currently processing (if uu it goes in slot 0, if dd it goes in slot 3, etc {uu=0,ud=1,du=2,dd=3}).
  4. Apply optional spin symmetries (possibly at the end of "put").
prckent commented 7 years ago

Right now the input reader simply treats the first correlation as 'uu' no matter what, and the second, if present, is treated as 'ud'.

We need to process this correctly and introduce a backwards incompatible change. i.e. ud is correctly handled if specified first.

I like Jaron's symmetry suggestion. i.e. symmetrized by default, but if symmetry = no is specified you can do what you like and the specified terms are correctly handled. Unspecified terms = 0.

ye-luo commented 7 years ago

The current behaviour must change. But I prefer a simpler solution and not changing input files for backward compatibility. 1) The first symmetry pair in the input overwrites all the uninitialized pairs. 2) For any given pair in the input, overwrite its last value and force ud=du 3) After taking all the input pairs, scan all the pairs. The only case having uninitialized pairs is having only ud/du in the input. Then overwrite all the pairs with ud/du term and print a remark. If no pairs are initialized, issue an error that no correlation at all is given.

@jtkrogel I think this will solve most of the issue. It has no sensitivity to the ordering and allows to distinguish uu and dd.

Fixing the wrong symmetrization is important but providing no symmetry may need to touch a lot of codes. Providing "Unspecified terms=0" can be non-trivial.

Paul-St-Young commented 7 years ago

@ye-luo I think the 'overwriting' behavior can be dangerous. For example, when a third species of quantum particle is present, say a positron or a proton ('p'). Then the 'uu' term will overwrite all pairs 'uu','dd','pp','ud','up','dp', etc.

ye-luo commented 7 years ago

@Paul-St-Young Is p included in the same particle set as u and d?

Paul-St-Young commented 7 years ago

Yes. In the quantum/target particle set.

prckent commented 7 years ago

The behavior should be concisely included in the output

ye-luo commented 7 years ago

@Paul-St-Young When you add 'p', will you on purpose ignore correlations for certain pairs? In general, we need some overwriting schemes to keep backward compatibility and we should clearly mention it in the manual and output. If you need more than that, currently it is on your own responsibility. I'm not saying your case should be ignored but 1) what is the default with 3 species is not clear 2) you can run what you need by specifying all the pairs. If things are clear, we can deal with the 3 species case.

jtkrogel commented 7 years ago

We can make things totally backwards compatible, while including the right behavior for 'p' if we optionally enforce u=d in general.

This is what we are doing now (backwards compatible), and since p!=u and p!=d it will not adversely affect the case when p is added (though it would enforce pu=pd if pu and pd are not provided separately, which I think is the desired behavior anyway).

The correct behavior does not involve overwriting, just filling in missing terms (by copy) following the u=d rule after all user-provided terms have been set.

ye-luo commented 7 years ago

So if we need to take care of all possible cases, addFunc should strictly do nothing but adding the give term and the builder needs to be smart. J2 only sees the particle group id instead of 'p/d/u' which only the builder can see.

Paul-St-Young commented 7 years ago

@ye-luo Yes I can run what I need by specifying all pairs. However, I currently have to hack the code to link the 'up' and 'dp' Jastrows, so that the optimizer does not do extra work. I like @jtkrogel 's suggestion of enforcing u=d in general, as well as @ye-luo 's suggestion to move the checking responsibility from addFunc to the builder.

ye-luo commented 7 years ago

@Paul-St-Young You probably don't need to hack the code. Just put the same name coefficients id="upd" for 'up' and 'dp' and the optimizer with treat their contribution to the same set of coefficients.

Paul-St-Young commented 7 years ago

@ye-luo That is a really useful feature I did not know about ... Is it in the manual somewhere?

ye-luo commented 7 years ago

@Paul-St-Young not in the manual. I was not fully sure it works as intended. You might see that one set of coefficients are zero in the opt.xml. Apart from that I didn't see strange energy values when I tried it in the past.

prckent commented 7 years ago

If we want this as a real feature it should be in the manual

prckent commented 7 years ago

Is this being actively worked on or should I move it to v3.3?

ye-luo commented 7 years ago

Fully protect p/u/d can be quite complicated. But just for u/d, I can patch the AoS code to align with SoA and GPU codes which are correct.

prckent commented 7 years ago

Please do so. AoS, SoA, and GPU codes should always be aligned.

ye-luo commented 6 years ago

@prckent The bug part of this ticket has been resolved. The need for better support u/d/p cases will be dealt after MS3.2.

prckent commented 6 years ago

Thanks all. Moving to MS3.3 although we can discuss the prioritization.

prckent commented 6 years ago

Seemingly this is not a priority, so removing from 3.3 milestone

ye-luo commented 3 years ago

1247 is not much different from this issue. I'm looking for a sane way to input unique pair/triplet functions and then each correlation pair/triplet pick up its function by names. No implicit symmetry handling is used to ensure non-ambiguity. The reason to separate unique pair/triplet functions from correlation is to make it sane for the WF optimization need.

Ye style 1.

<jastrow type="Two-Body" name="J2" function="bspline" print="yes">
  <functors>
    <functor name="uu"size="8" rcut="3">
      <coefficients type="Array">
        0.2969700506 0.2407083361 0.1830009217 0.1355563432 0.09514725057 0.06112611044 
        0.03450633959 0.01532305926
      </coefficients>
    <functor>
    <functor name="ud"size="8" rcut="3">
      <coefficients type="Array">
        0.4704510359 0.3531940571 0.2579420961 0.1820538307 0.1211946391 0.07487030513 
        0.04028963542 0.01742251234
      </coefficients>
    <functor>
  </functors>
  <correlation speciesA="u" speciesB="u" functor="uu"/>
  <correlation speciesA="u" speciesB="d" functor="ud"/>
  <correlation speciesA="d" speciesB="d" functor="uu"/>
</jastrow>

This input scheme will be scalable to 3 species (u, d, p) and also J1 and J3.

jtkrogel commented 3 years ago

How about we keep the input the same: Jaron style 1

<jastrow type="Two-Body" name="J2" function="bspline" print="yes">
  <correlation speciesA="u" speciesB="u" size="8" rcut="3">
      <coefficients type="Array">
        0.2969700506 0.2407083361 0.1830009217 0.1355563432 0.09514725057 0.06112611044 
        0.03450633959 0.01532305926
      </coefficients>
  </correlation>
  <correlation speciesA="u" speciesB="d" size="8" rcut="3">
      <coefficients type="Array">
        0.4704510359 0.3531940571 0.2579420961 0.1820538307 0.1211946391 0.07487030513 
        0.04028963542 0.01742251234
      </coefficients>
  </correlation>
</jastrow>

And then do something simple to extend it: Jaron style 2

<jastrow type="Two-Body" name="J2" function="bspline" print="yes">
  <correlation speciesA="u" speciesB="u" size="8" rcut="3">
      <coefficients type="Array">
        0.2969700506 0.2407083361 0.1830009217 0.1355563432 0.09514725057 0.06112611044 
        0.03450633959 0.01532305926
      </coefficients>
  </correlation>
  <correlation speciesA="u" speciesB="d" size="8" rcut="3">
      <coefficients type="Array">
        0.4704510359 0.3531940571 0.2579420961 0.1820538307 0.1211946391 0.07487030513 
        0.04028963542 0.01742251234
      </coefficients>
  </correlation>
  <correlation speciesA="d" speciesB="d" same_as="uu"/>
</jastrow>
ye-luo commented 3 years ago

@jtkrogel in your example, the ownership of functor uu is ambiguous. does it belong to uu or dd pair? Prefer it not belonging to any correlation. Your input example seems like having backward compatibility but it relies on implicit rules and same_as does overwriting. Still a lot of hassles. So I'd like to leave 0 ambiguity. If a correlation is not given in the input, it means no correlation.

jtkrogel commented 3 years ago

The input does not need to determine the ownership of instantiated objects, QMCPACK does. I would really rather not "fix" a coding problem by making the input more complex.

I don't see how the above is implicit or ambiguous. The user request is explicit and clear. QMCPACK can do what it wants (hopefully something sensible in terms of internal data structures) to handle ownership, etc.

It wouldn't be hard to design a "functors" data structure that builds/owns the correlation functors in this case, if that is what is desired.

ye-luo commented 3 years ago

@jtkrogel if you remove <correlation speciesA="d" speciesB="d" same_as="uu"/> (Jaron style), the legacy code links dd to uu. This is a dangerous implicit rule to me. That is what I mean ambiguous. For me, I should have to right to input no correlation on dd. If we drop the implicit rule, it is anyway a breaking change. No different than writing a more organized input style. The logic is very similar to taking SPOSet input out of deterimants.

We also have the difficulty to make the implicit rule to scale to 3 spin species (u,d,p).

Probably there are two conceptional topics.

  1. dropping the implicit rule.
  2. how to organize the input.
jtkrogel commented 3 years ago
  1. Remain backwards compatible (this issue isn't worth the breaking change).
  2. The "implicit rule" (i.e. the default of dd==uu) applies only if just uu and ud are provided.
  3. If you really want no correlation on dd, supply an empty <correlation speciesA="d" speciesB="d"/> element. This desire is so rare that it really should be explicit.
  4. Make any input innovations needed to support additional cases minimal (what I suggested is a simple way to do this, even for u,d,p, etc).
Paul-St-Young commented 3 years ago

so I actually implemented Jaron style 1 and 2 in my own branch https://github.com/Paul-St-Young/qmcpack/blob/kylin/src/QMCWaveFunctions/Jastrow/BsplineJastrowBuilder.cpp (I used link="uu" instead of same_as="uu")

Backwards compatibility is maintained by hard-coding our current implicit linking when only "u" and "d" groups are detected.

I think the key is to have an assertion at the end making sure that all pairs have been assigned a function.

bool success = J2->checkInitialization();
if (!success) APP_ABORT("failed to initialize two-body Jastrow");

I hope the findFunc and linkFunc routines I added to TwoBodyJastrowOrbital are still useful https://github.com/Paul-St-Young/qmcpack/blob/kylin/src/QMCWaveFunctions/Jastrow/TwoBodyJastrowOrbital.h also checkInitialization()

ye-luo commented 3 years ago
1. Remain backwards compatible (this issue isn't worth the breaking change).

2. The "implicit rule" (i.e. the default of dd==uu) applies only if just uu and ud are provided.

3. If you really want no correlation on dd, supply an empty `<correlation speciesA="d" speciesB="d"/>` element.  This desire is so rare that it really should be explicit.

4. Make any input innovations needed to support additional cases minimal (what I suggested is a simple way to do this, even for u,d,p, etc).
  1. backward compatible can be supported if functor are not found. I'm looking for a sane input solution when "back compatible" mode is off.
  2. hard coding implicit rules is bad. It has bitten us so many times. I don't like hard-coding uu or dd names. names should be chosen freely. Current implicit rule is completely wrong in u,d,p case + the reported issue here. It is not worth fixing the current broken implicit rule.

On the style side in the u,d,p case Ye style 1

<jastrow type="Two-Body" name="J2" function="bspline" print="yes">

    <functor name="ee" size="3" rcut="3">
      <coefficients type="Array">
        0.2969700506 0.2407083361 0.1830009217
      </coefficients>
    </functor>
    <functor name="ud" size="3" rcut="3">
      <coefficients type="Array">
        0.4704510359 0.3531940571 0.2579420961
      </coefficients>
    </functor>
    <functor name="ep" size="3" rcut="3">
      <coefficients type="Array">
        0.1820538307 0.1211946391 0.07487030513 
      </coefficients>
    </functor>
    <functor name="pp" size="3" rcut="3">
      <coefficients type="Array">
        0.04028963542 0.01742251234 0.07487030513 
      </coefficients>
    </functor>

  <correlation speciesA="u" speciesB="u" functor="ee"/>
  <correlation speciesA="u" speciesB="d" functor="ud"/>
  <correlation speciesA="d" speciesB="d" functor="ee"/>
  <correlation speciesA="p" speciesB="p" functor="pp"/>
  <correlation speciesA="d" speciesB="p" functor="ep"/>
  <correlation speciesA="u" speciesB="p" functor="ep"/>
</jastrow>

Jaron style 2

<jastrow type="Two-Body" name="J2" function="bspline" print="yes">
    <correlation speciesA="u" speciesB="u" size="3" rcut="3">
      <coefficients type="Array">
        0.2969700506 0.2407083361 0.1830009217
      </coefficients>
    </correlation>
    <correlation speciesA="u" speciesB="d" size="3" rcut="3">
      <coefficients type="Array">
        0.4704510359 0.3531940571 0.2579420961
      </coefficients>
    </correlation>
    <correlation speciesA="u" speciesB="p" size="3" rcut="3">
      <coefficients type="Array">
        0.1820538307 0.1211946391 0.07487030513 
      </coefficients>
    </correlation>
    <correlation speciesA="p" speciesB="p" size="3" rcut="3">
      <coefficients type="Array">
        0.04028963542 0.01742251234 0.07487030513 
      </coefficients>
    </correlation>
  <correlation speciesA="d" speciesB="d" same_as="uu"/>
  <correlation speciesA="d" speciesB="p" same_as="up"/>
</jastrow>

style 2 also carries existing xml problem, one XML leaf can be interpreted as both a functor and a correlation.

ye-luo commented 1 year ago

Another style with "link" entry.

<jastrow type="Two-Body" name="J2" function="bspline" print="yes">
    <correlation speciesA="u" speciesB="u" size="3" rcut="3">
      <coefficients type="Array">
        0.2969700506 0.2407083361 0.1830009217
      </coefficients>
      <link speciesA="d" speciesB="d"/>
    </correlation>
    <correlation speciesA="u" speciesB="d" size="3" rcut="3">
      <coefficients type="Array">
        0.4704510359 0.3531940571 0.2579420961
      </coefficients>
    </correlation>
    <correlation speciesA="u" speciesB="p" size="3" rcut="3">
      <coefficients type="Array">
        0.1820538307 0.1211946391 0.07487030513 
      </coefficients>
      <link speciesA="d" speciesB="p"/>
    </correlation>
    <correlation speciesA="p" speciesB="p" size="3" rcut="3">
      <coefficients type="Array">
        0.04028963542 0.01742251234 0.07487030513 
      </coefficients>
    </correlation>
</jastrow>