QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
297 stars 139 forks source link

Orbital rotation issues #3983

Open markdewing opened 2 years ago

markdewing commented 2 years ago

A collection of issues and narrative saga related to orbital rotation.

Related issues

See #3564 on getting the variational parameter indices correct (still not fixed). #2023 is vague (no reproducer). #637 relates to using a spline basis.

Trying to get orbital rotation correct has sent me down a long yak shaving chain. Trying to fix one issue leads to other issues.

I tried to start submitting fixes in #3977, but it's not clear that code is even correct, and there are no good tests to validate it.

Testing

The existing tests (in https://github.com/QMCPACK/qmcpack/tree/develop/tests/molecules/H2_ae and https://github.com/QMCPACK/qmcpack/tree/develop/tests/molecules/H4_ae ) target excited states and require BUILD_LMYENGINE_INTERFACE to be enabled. They also perform complete optimization.

Tests are needed that simplify it further. The first step is getting parameter derivatives correct. For an extremely simple test case. My favorite simple case is an He atom with STO orbitals and a Pade Jastrow. The wavefunction excerpt:

   <jastrow name="Jee" type="Two-Body" function="pade">
      <correlation speciesA="u" speciesB="d">
        <var id="jud_b" name="B">0.1</var>
      </correlation>
    </jastrow>

    <sposet_collection type="MolecularOrbital">
      <!-- Use a single Slater Type Orbital (STO) for the basis. Cusp condition is correct. -->
      <basisset keyword="STO" transform="no">
        <atomicBasisSet type="STO" elementType="He">
          <basisGroup rid="R0" n="1" l="0" m="0" type="Slater">
             <radfunc exponent="2.0"/>
          </basisGroup>
          <basisGroup rid="R1" n="2" l="0" m="0" type="Slater">
             <radfunc exponent="1.0"/>
          </basisGroup>
        </atomicBasisSet>
      </basisset>
      <sposet basisset="LCAOBSet" name="spo-up" size="2" optimize="yes">
          <coefficient id="updetC" type="Array" size="2">
            1.0 0.0
            0.0 1.0
          </coefficient>
      </sposet>
    </sposet_collection>
    <determinantset type="MO" key="STO" transform="no" source="ion0">
      <slaterdeterminant>
        <determinant id="spo-up" spin="1" size="2"/>
        <determinant id="spo-up" spin="-1" size="2"/>
      </slaterdeterminant>
    </determinantset>
  </wavefunction>

The ground state is the target here, the rotation coefficients should be the same for both up and down. One way to ensure this is to reference the same determinant for both up and down spins (<determinant id="spo-up" ...). When computing the parameter derivative using the gradient test, the values wind up off by 50% because the values from each determinant need to be accumulated rather than overwritten.

The relevant code is here, and the = needs to be replaced with +=: https://github.com/QMCPACK/qmcpack/blob/57677dfe37f911dbf10b1d9cee9fbff177202215/src/QMCWaveFunctions/RotatedSPOs.cpp#L312-L313 There are other routines for evaluating derivatives (multi-determinants, etc). i don't know if a similar change needs to be made to those?

Computing the parameter derivative gives

  jud_b                                    =           -0.5777 +/-           0.0039 
  spo-up_orb_rot_0000_0001 =            0.7972 +/-           0.0082 

(Energy is -2.848(2) Those derivatives seem large. I would expect the optimal rotation angle to be small, which would lead to a small gradient at zero. (This is a hand-wavy argument. The energy landscape could be a steep narrow valley, giving a large gradient even close to the optimal value.)

A Julia code that computes the derivatives gives a B derivative of -0.057 and theta derivative of 0.30. (Energy = -2.8768(3))

The energy from QMCPCK is -2.848(2) and from the Julia code is -2.8768(3). The difference of 0.03, which indicates a bug or the wavefunction between the two codes is not the same.

Computing the parameter gradient with just the Pade jastrow gives

  jud_b                 =           -0.1146 +/-           0.0029

This gradient is much smaller than the when orbital rotation parameters are enabled, indicating a bug (All the gradient test results compare the numerical and analytic results are fine, indicating the consistency between parameters is fine) The Julia code gives -0.06(4), which is about a factor of two off of the QMCPACK result (though the error bars are very large)

So far the QMCPACK runs have been done with 1 thread.

Constraints

Another way to keep the rotations the same between up and down is to use constraints. This is implemented in the code but not well supported. I don't think it will be supported going forward, but I'm including here for completeness. Using an input with separate spo-up and spo-down sposts, add the following to the qmc method section

      <set type="equal">
        spo-up_orb_rot_0000_0001, spo-down_orb_rot_0000_0001
      </set>

The code issue comes in from OptVariables and OptVariablesForPsi in QMCCostFunctionBase. Normally, they are the same size, but with constraints they are different sizes. NumOptimizables is the size of OptVariables. In several places, the size of OptVariablesForPsi should be used instead (in the sizes of rDsaved, etc in QMCCostFunction::checkConfigurations, and other places that use rDsaved and similar variables)

Cusp correction

Might need to use psi.rows() or psi.size() instead of BasisSetSize in soaCuspCorrection::evaluate_vgl and evaluateV (and change addV and evaluateV to pass the additional size information). I think these changes are due to orbital rotation.

ye-luo commented 2 years ago

Thanks @markdewing for the investigation. I think when the feature was added, there was an issue that RHF is not handled properly, namely the partial derivatives are likely not aggregated correctly. Only UHF was tested. So if you use two sets of sposet in the collection, do the derivatives come out right?

markdewing commented 2 years ago

With separate sposets, the gradient test results are

Param_Name               Value Numeric  Analytic  Percent
jud_b                      0.1  -0.5877  -0.5877  7.82e-07
spo-up_orb_rot_0000_0001   0     0.4132   0.4132  7.085e-07
spo-down_orb_rot_0000_0001 0     0.3987   0.3987  1.294e-06

The single SPO result for the rotation parameter derivative (0.8) looks close to the sum of the separate derivatives.

The jastrow parameter derivative looks suspicious because it's quite a bit different (-0.59) from the value if there is no rotation (-0.11). There will be some parameter cross-correlation, but this seems a bit too much.

markdewing commented 2 years ago

Once #4131 goes in, I have a plan to keep the parameters as a total rotation from the initial coefficient matrix. The current rotated MO coefficients could be recomputed at anytime from the initial coefficients and the current rotation parameters. This removes a bunch of state-related issues. A call to resetParameters would do the following:

  1. Compute the delta between the new parameters and the existing parameters
  2. Compute the rotation matrices for the existing (old) parameters and the delta parameters
  3. Multiply the rotation matrices
  4. Apply the new rotation matrix to the MO coefficients
  5. Extract updated parameters from the new rotation matrix and store them as the current set of parameters.

One issue is that the rotations being optimized don't use the entire space of rotations available. That is, only occupied -> unoccupied rotations are considered. However, when multiplying these rotations, they start to fill in other parts of the rotation matrix (occupied->occupied and unoccupied->unoccupied). Extracting just the subset of entries from the antisymmetric matrix will leave the total rotation incomplete and incorrect. Internally the code can be modified to store the full set of rotations entries as the existing parameters and new parameters. And the optimizer code can continue to compute derivatives for and updates to the subset.

This leads to the issue of how to store the rotations from an optimizer run for later use. Options:

  1. Store the coefficient matrix
    • This seems undesirable because any compact representation of how the origin of the optimized coefficient matrix is lost.
    • It could get unwieldy with splines.
  2. Save a history of rotations
    • Only involves the affected rotations, but requires an arbitrary number of these sets of rotation parameters to get stored. The length depends on the number of optimizer steps.
  3. Save the extended set to reconstruct the rotation matrix
    • Storing the parameters is no longer a simple 1:1 mapping from the list of optimizable parameters to disk. Some parameter sets will need to store extra data to completely specify the state of the wavefunction. (Option 2 has this issue as well). (One advantage of option 1 is that is does not have to deal with this issue)
ye-luo commented 2 years ago

Option 2 I don't see a need of keeping the history. Option 3 is the only one workable.