BoothGroup / Vayesta

A Python package for wave function-based quantum embedding
Apache License 2.0
32 stars 7 forks source link

Global WF DM1 using BTensor library #146

Open maxnus opened 11 months ago

maxnus commented 11 months ago

This is a draft for a PR to add my BTensor library as a dependence to Vayesta, in order to simplify certain functions, such as the global WF -> DM routines (I developed it for this purpose).

A short introduction to the library can be found here and a more practical example here.

Improved Contractions

The $T_2 \times L_2$ contractions currently look like this:

t2_before

and simplifies to (ignore my slightly confused PyCharm syntax highlighting):

t2_after_1 with t2_after_2 and some additional setup such as: t2_after_3 to initialize the theta and l2 tensor objects.

Overlap matrices between cluster bases are automatically added by the library. It also supports contraction over an intermediately constructed and pruned "intersection" basis, via the intersect_tol argument to einsum. This avoids having the svd_tol logic in the 1-DM routine. Note however, that with a finite svd_tol, the results will be slightly different to the original implementation for some fairly technical reasons (but as fas as I can tell, not worse when compared to the non-approximated DM, i.e. both are valid approximations).

Dealing with Symmetry

Contributions from symmetry-derived fragments fx are currently included like this:

symmetry_before

This is difficult to understand, but means the following:

Loop over all fragments fx2 deriving from fx (via the loop_symmetry_children generator) and take the tuple of arrays (fx,.cluster.c_occ, fx.cluster.c_vir, doox, dvvx) and perform the respective symmetry operations along axis [0, 0, 1, 1], respectively. The result of this (named cx2_occ, etc), will then be transformed accordingly and added to the final DM. For nested symmetries, say translations + rotations, the generator will automatically keep the intermediately transformed arrays (for example, an array that has been translated, but not yet rotated) stored, to avoid performing the same operations more than once.

In the btensor version, we deal with the symmetry a bit differently:

symmetry_after

Here, we first transform the desired quantity into the AO basis and then loop over a set of symmetry-transformed ao_sym bases and use the replace_basis method to replace the basis of the doox3 and dvvx3 tensors inplace (active transformation). We can then add the result to doo and dvv (which are tensors in the occupied/virtual MO basis), since the library will perform the required back transformation automatically.

I think this way of dealing with symmetry is definitely easier to understand, but I'm not yet fully sure about performance implications. At the moment, ao and ao_sym are related via full $N_\text{AO}^2$ matrices (except for translations, which are pure permutations), so the transformations will be $N^3$. In principle, all these matrices should be sparse and block diagonal, since only the rotations between the angular momentum orbitals around their center lead to contributions that cannot be described by a permutation; so it should be possible to deal with the transformation more efficiently for large matrices.

For completeness, the ao_sym generator looks like this, :

symmetry_loop

Performance and Testing

So far, performance seems to be looking good, especially without using the SVD approximation. This is for 4 $\times$ $\text{H}_2\text{O}$ in cc-pVQZ, without svd_tol (time in seconds):

without_svd and with svd_tol = 1e-3:

with_svd

The script for this can be found here

However, I think we need to perform a bit more testing around performance, especially for large systems, with and without SVD tolerance, with and without symmetry, and with MPI, and pay attention to both runtime and memory, before we commit to this way of handling basis transformations. Maybe we should also have a more detailed confirmation that this new SVD approximation is not worse than the current one, when comparing to the non-approximated output. We might also have to reassess the domain, where SVD makes sense, given that non-SVD is faster for the example here (we still have the possibility to skip cluster-pairs based on their overlap).

maxnus commented 11 months ago

The performance with svd_tol > 0 is improved with btensor version 1.1.2:

improved_intersect

(The runs after the first are faster due to caching)