Open markdewing opened 2 years ago
Thanks @markdewing for the investigation. I think when the feature was added, there was an issue that RHF is not handled properly, namely the partial derivatives are likely not aggregated correctly. Only UHF was tested. So if you use two sets of sposet in the collection, do the derivatives come out right?
With separate sposets, the gradient test results are
Param_Name Value Numeric Analytic Percent
jud_b 0.1 -0.5877 -0.5877 7.82e-07
spo-up_orb_rot_0000_0001 0 0.4132 0.4132 7.085e-07
spo-down_orb_rot_0000_0001 0 0.3987 0.3987 1.294e-06
The single SPO result for the rotation parameter derivative (0.8) looks close to the sum of the separate derivatives.
The jastrow parameter derivative looks suspicious because it's quite a bit different (-0.59) from the value if there is no rotation (-0.11). There will be some parameter cross-correlation, but this seems a bit too much.
Once #4131 goes in, I have a plan to keep the parameters as a total rotation from the initial coefficient matrix. The current rotated MO coefficients could be recomputed at anytime from the initial coefficients and the current rotation parameters. This removes a bunch of state-related issues. A call to resetParameters would do the following:
One issue is that the rotations being optimized don't use the entire space of rotations available. That is, only occupied -> unoccupied rotations are considered. However, when multiplying these rotations, they start to fill in other parts of the rotation matrix (occupied->occupied and unoccupied->unoccupied). Extracting just the subset of entries from the antisymmetric matrix will leave the total rotation incomplete and incorrect. Internally the code can be modified to store the full set of rotations entries as the existing parameters and new parameters. And the optimizer code can continue to compute derivatives for and updates to the subset.
This leads to the issue of how to store the rotations from an optimizer run for later use. Options:
Option 2 I don't see a need of keeping the history. Option 3 is the only one workable.
A collection of issues and narrative saga related to orbital rotation.
Related issues
See #3564 on getting the variational parameter indices correct (still not fixed). #2023 is vague (no reproducer). #637 relates to using a spline basis.
Trying to get orbital rotation correct has sent me down a long yak shaving chain. Trying to fix one issue leads to other issues.
I tried to start submitting fixes in #3977, but it's not clear that code is even correct, and there are no good tests to validate it.
Testing
The existing tests (in https://github.com/QMCPACK/qmcpack/tree/develop/tests/molecules/H2_ae and https://github.com/QMCPACK/qmcpack/tree/develop/tests/molecules/H4_ae ) target excited states and require BUILD_LMYENGINE_INTERFACE to be enabled. They also perform complete optimization.
Tests are needed that simplify it further. The first step is getting parameter derivatives correct. For an extremely simple test case. My favorite simple case is an He atom with STO orbitals and a Pade Jastrow. The wavefunction excerpt:
The ground state is the target here, the rotation coefficients should be the same for both up and down. One way to ensure this is to reference the same determinant for both up and down spins (
<determinant id="spo-up" ...
). When computing the parameter derivative using the gradient test, the values wind up off by 50% because the values from each determinant need to be accumulated rather than overwritten.The relevant code is here, and the
=
needs to be replaced with+=
: https://github.com/QMCPACK/qmcpack/blob/57677dfe37f911dbf10b1d9cee9fbff177202215/src/QMCWaveFunctions/RotatedSPOs.cpp#L312-L313 There are other routines for evaluating derivatives (multi-determinants, etc). i don't know if a similar change needs to be made to those?Computing the parameter derivative gives
(Energy is -2.848(2) Those derivatives seem large. I would expect the optimal rotation angle to be small, which would lead to a small gradient at zero. (This is a hand-wavy argument. The energy landscape could be a steep narrow valley, giving a large gradient even close to the optimal value.)
A Julia code that computes the derivatives gives a B derivative of -0.057 and theta derivative of 0.30. (Energy = -2.8768(3))
The energy from QMCPCK is -2.848(2) and from the Julia code is -2.8768(3). The difference of 0.03, which indicates a bug or the wavefunction between the two codes is not the same.
Computing the parameter gradient with just the Pade jastrow gives
This gradient is much smaller than the when orbital rotation parameters are enabled, indicating a bug (All the gradient test results compare the numerical and analytic results are fine, indicating the consistency between parameters is fine) The Julia code gives -0.06(4), which is about a factor of two off of the QMCPACK result (though the error bars are very large)
So far the QMCPACK runs have been done with 1 thread.
Constraints
Another way to keep the rotations the same between up and down is to use constraints. This is implemented in the code but not well supported. I don't think it will be supported going forward, but I'm including here for completeness. Using an input with separate spo-up and spo-down sposts, add the following to the qmc method section
The code issue comes in from OptVariables and OptVariablesForPsi in QMCCostFunctionBase. Normally, they are the same size, but with constraints they are different sizes. NumOptimizables is the size of OptVariables. In several places, the size of OptVariablesForPsi should be used instead (in the sizes of rDsaved, etc in QMCCostFunction::checkConfigurations, and other places that use rDsaved and similar variables)
Cusp correction
Might need to use psi.rows() or psi.size() instead of BasisSetSize in soaCuspCorrection::evaluate_vgl and evaluateV (and change addV and evaluateV to pass the additional size information). I think these changes are due to orbital rotation.