[BUG] different density matrix and Basis by using pscf and dqc

JSchmie commented 3 years ago

I discovered some bug or more or less something I don't understand. So first when I want to just read the Basis from Basis Set Exchange the Parameters are different. I discovered a normalization function in your code, but I don't know what it really does. Bur it changes the input coefficients of the Basis functions. Another Problem I got was by calculation the overlapmatix. In comparison to a similar calculation in pyscf the matrix values are sometime the same. Using another Basis, the overlapmatix differs from each others.

So here is a code to replicate the errors:


from pyscf import gto, scf
import dqc
import numpy as np
import xitorch as xt
from dqc.utils.datastruct import AtomCGTOBasis
import dqc.hamilton.intor as intor
from dqc.api.parser import parse_moldesc

#using pscf to calc:
#   the overlap Matrix
#   as well as print the coefficients of the basis sets

def overlapb_scf(basis):
    """
    just creates the overlap matrix for different input basis
    """
    mol = gto.Mole()
    mol.atom = [['H', [1.0, 0.0, 0.0]],
                ['H', [-1.0, 0.0, 0.0]]]
    mol.spin = 0
    mol.unit = 'Bohr' # in Angstrom
    mol.verbose = 6
    mol.output = 'scf.out'
    mol.symmetry = False
    mol.basis = basis
    mol.build()

    mf = scf.RHF(mol)
    mf.kernel()
    mf.mo_coeff
    dm = mf.make_rdm1()
    overlap = mf.get_ovlp()
    return overlap

########################################################################################################################

#using the dqc module to calc the basis and the overlap matrix

basis_dqc = [dqc.loadbasis("1:3-21G"), dqc.loadbasis("1:3-21G")]
basis2_dqc = [dqc.loadbasis("1:cc-pvdz"), dqc.loadbasis("1:cc-pvdz")]

def overlapb_dqc(basis):
    """
      just creates the overlap matrix for different input basis
    """
    bpacker = xt.Packer(basis)
    bparams = bpacker.get_param_tensor()

    atomstruc = "H 1 0 0; H -1 0 0"
    atomzs, atompos = parse_moldesc(atomstruc)
    atombases = [AtomCGTOBasis(atomz=atomzs[i], bases=basis[i], pos=atompos[i]) for i in range(len(basis))]

    wrap = dqc.hamilton.intor.LibcintWrapper(
            atombases)  # creates an wrapper object to pass informations on lower functions
    return intor.overlap(wrap)

print(f"the pyscf:\n3-21G: \n\t{gto.basis.load('3-21G',symb='H')}\n"
      f"coefficients of cc-pvdz: \n\t{gto.basis.load('cc-pvdz',symb='H')} ")
print(f"\nthe dqc:\n1:3-21G:\n\t {basis_dqc[0]},\n1:cc-pvdz\n\t{basis2_dqc[0]} ")

#checks that the overlap matrix is the same:
print(f"3-21G: {np.all(np.isclose(overlapb_scf('3-21G'),overlapb_dqc(basis_dqc)))}"
      f"\ncc-pvdz: {np.all(np.isclose(overlapb_scf('cc-pvdz'),overlapb_dqc(basis2_dqc)))}")

So the output is:


the pyscf:
3-21G: 
    [[0, [5.447178, 0.156285], [0.824547, 0.904691]], [0, [0.183192, 1.0]]]
coefficients of cc-pvdz: 
    [[0, [13.01, 0.019685], [1.962, 0.137977], [0.4446, 0.478148]], [0, [0.122, 1.0]], [1, [0.727, 1.0]]] 

the dqc:
1:3-21G:
     [CGTOBasis(angmom=0, alphas=tensor([5.4472, 0.8245], dtype=torch.float64), coeffs=tensor([1.4079, 1.9778], dtype=torch.float64), normalized=True), CGTOBasis(angmom=0, alphas=tensor([0.1832], dtype=torch.float64), coeffs=tensor([0.7074], dtype=torch.float64), normalized=True)],
1:cc-pvdz
    [CGTOBasis(angmom=0, alphas=tensor([13.0100,  1.9620,  0.4446,  0.1220], dtype=torch.float64), coeffs=tensor([0.3407, 0.5779, 0.6577, 0.2614], dtype=torch.float64), normalized=True), CGTOBasis(angmom=0, alphas=tensor([0.1220], dtype=torch.float64), coeffs=tensor([0.5215], dtype=torch.float64), normalized=True), CGTOBasis(angmom=1, alphas=tensor([0.7270], dtype=torch.float64), coeffs=tensor([1.9584], dtype=torch.float64), normalized=True)] 
overwrite output file: scf.out
overwrite output file: scf.out
3-21G: True
cc-pvdz: False

So my question is when does the overlap matrix the same in pscf and dqc. As well as when will they differ from each other?

Desktop (please complete the following information):

DQC version: [e.g. 0.1.0]
Python version: [e.g. 3.8]
PyTorch version: [e.g. 1.9]
OS: [Ubuntu 18.04]

mfkasim1 commented 3 years ago

As the issue #1 has been resolved, is this also resolved?

JSchmie commented 3 years ago

As the issue #1 has been resolved, is this also resolved?

Maybe. Is there a way like in issue #1 where I can get these orthogonalize_basis=False by load basis as well? Because in this case, I do not use the MOL Object.

mfkasim1 commented 3 years ago

Ah, sorry. I misunderstood your problem. The normalization in the code is to make it compatible with libcint (which PySCF also uses). The difference you present in the output section is probably due to different convention on which coefficient to display between DQC and PySCF (whether it is normalized coefficients or non-normalized coefficients).

JSchmie commented 3 years ago

Ah, sorry. I misunderstood your problem. The normalization in the code is to make it compatible with libcint (which PySCF also uses). The difference you present in the output section is probably due to different convention on which coefficient to display between DQC and PySCF (whether it is normalized coefficients or non-normalized coefficients).

Ok, but why is the overlap Matrix only for some specific cases (of used basis) equal to the Overlap matrix used by pscf? Shouldn't it be always different or always equal to each other?

JonathanSchmidt1 commented 3 years ago

Hello and thank you for the quick responses. I am working with Jacob here and I would like to add a few details to the problem of differing overlap matrices. If we consider e.g. a single H atom we arrive the same overlap matrix (up to 10-7) for both codes:

 from pyscf import gto, scf
mol = gto.Mole()
mol.atom = [['Ni', [-1, 0.0, 0.0]]]
mol.spin = 0
mol.basis = '3-21G'
pyscf_o = mol.get_ovlp()

import dqc
from dqc.utils.datastruct import AtomCGTOBasis
import dqc.hamilton.intor as intor

basis=[dqc.loadbasis("28:3-21G")]
m = dqc.Mol("Ni 1 0 0", basis=basis)
atombases = [
        AtomCGTOBasis(atomz=m.atomzs[i], bases=basis[i], pos=m.atompos[i]) \
        for i in range(len(basis))
    ]
wrap = dqc.hamilton.intor.LibcintWrapper(atombases)
dqc_o = intor.overlap(wrap)

pyscf_o-dqc_o.numpy()

You will get a similar result when using He. However, as soon as we use a basis with ang_moms>0 we arrive at different results e.g. 3-21G for Li or Ni, or H with cc-pvdz. Li:

from pyscf import gto, scf
mol = gto.Mole()
mol.atom = [['Li', [-1, 0.0, 0.0]]]
mol.spin = 3
mol.basis = '3-21G'
pyscf_o = mol.get_ovlp()

import dqc
from dqc.utils.datastruct import AtomCGTOBasis
import dqc.hamilton.intor as intor

basis=[dqc.loadbasis("3:3-21G")]
m = dqc.Mol("Li -1 0 0", basis=basis)
atombases = [
        AtomCGTOBasis(atomz=m.atomzs[i], bases=basis[i], pos=m.atompos[i]) \
        for i in range(len(basis))
    ]
wrap = dqc.hamilton.intor.LibcintWrapper(atombases)
dqc_o = intor.overlap(wrap)

pyscf_o-dqc_o.numpy()

H cc-pvdz:

from pyscf import gto, scf
mol = gto.Mole()
mol.atom = [['H', [-1, 0.0, 0.0]]]
mol.spin = 1
mol.basis = 'cc-pvdz'
pyscf_o = mol.get_ovlp()

import dqc
from dqc.utils.datastruct import AtomCGTOBasis
import dqc.hamilton.intor as intor

basis=[dqc.loadbasis("1:cc-pvdz")]
m = dqc.Mol("H -1 0 0", basis=basis)
atombases = [
        AtomCGTOBasis(atomz=m.atomzs[i], bases=basis[i], pos=m.atompos[i]) \
        for i in range(len(basis))
    ]
wrap = dqc.hamilton.intor.LibcintWrapper(atombases)
dqc_o = intor.overlap(wrap)

pyscf_o-dqc_o.numpy()

So maybe the difference is in the conventions for higher angular momentum terms in the two codes. cheers and thank you for your help.

mfkasim1 commented 3 years ago

Ok, but why is the overlap Matrix only for some specific cases (of used basis) equal to the Overlap matrix used by pscf? Shouldn't it be always different or always equal to each other?

It's expected to have the same (permutation-equivariant) to PySCF. The difference is just what displayed in the printed coefficients (after normalization or before normalization).

However, as soon as we use a basis with ang_moms>0 we arrive at different results e.g. 3-21G for Li or Ni, or H with cc-pvdz. Li

As far as I'm aware of, the difference is just in the order of rows/cols of the matrix because the basis order in DQC for higher angmom is different from PySCF.

JonathanSchmidt1 commented 3 years ago

As far as I'm aware of, the difference is just in the order of rows/cols of the matrix because the basis order in DQC for higher angmom is different from PySCF.

That makes sense, I confirmed this for Li with 3-21G. I did not expect that as I had already checked this before for H with cc-pvzd where this does not seem to be the case. H cc-pvdz gives: pyscf: array([[1. , 0.6849, 0. , 0. , 0. ], [0.6849, 1. , 0. , 0. , 0. ], [0. , 0. , 1. , 0. , 0. ], [0. , 0. , 0. , 1. , 0. ], [0. , 0. , 0. , 0. , 1. ]]) dqc array([[1. , 0.9037, 0. , 0. , 0. ], [0.9037, 1. , 0. , 0. , 0. ], [0. , 0. , 1. , 0. , 0. ], [0. , 0. , 0. , 1. , 0. ], [0. , 0. , 0. , 0. , 1. ]])

I also checked that using cc-pvdz there is also a difference for Li that goes beyond that order of the basis functions.

mfkasim1 commented 3 years ago

It might be because the basis cc-pvdz from pyscf is different from cc-pvdz used in dqc.

PySCF (https://github.com/pyscf/pyscf/blob/3e8ba375fd4229f000289589fd02193633380853/pyscf/gto/basis/cc-pvdz.dat#L17-L25):

#BASIS SET: (4s,1p) -> [2s,1p]
H    S
     13.0100000              0.0196850        
      1.9620000              0.1379770        
      0.4446000              0.4781480        
H    S
      0.1220000              1.0000000        
H    P
      0.7270000              1.0000000

while in DQC, it downloads from basis set exchange (https://www.basissetexchange.org/basis/cc-pvdz/format/nwchem/?version=1&elements=1):

#BASIS SET: (4s,1p) -> [2s,1p]
H    S
      1.301000E+01           1.968500E-02           0.000000E+00
      1.962000E+00           1.379770E-01           0.000000E+00
      4.446000E-01           4.781480E-01           0.000000E+00
      1.220000E-01           5.012400E-01           1.000000E+00
H    P
      7.270000E-01           1.0000000
END

The first s basis in DQC includes the last row with weights 0.50124 while in PySCF, it doesn't include the last row. If you manually change the downloaded basis in DQC to fit the PySCF format, I suspect it will give the same results.

JonathanSchmidt1 commented 3 years ago

Thank you very much, I overlooked the difference in the basis sets. Than everything should be clear now.

JSchmie commented 3 years ago

It might be because the basis cc-pvdz from pyscf is different from cc-pvdz used in dqc.

PySCF (https://github.com/pyscf/pyscf/blob/3e8ba375fd4229f000289589fd02193633380853/pyscf/gto/basis/cc-pvdz.dat#L17-L25):
#BASIS SET: (4s,1p) -> [2s,1p]
H    S
     13.0100000              0.0196850        
      1.9620000              0.1379770        
      0.4446000              0.4781480        
H    S
      0.1220000              1.0000000        
H    P
      0.7270000              1.0000000
while in DQC, it downloads from basis set exchange (https://www.basissetexchange.org/basis/cc-pvdz/format/nwchem/?version=1&elements=1):
#BASIS SET: (4s,1p) -> [2s,1p]
H    S
      1.301000E+01           1.968500E-02           0.000000E+00
      1.962000E+00           1.379770E-01           0.000000E+00
      4.446000E-01           4.781480E-01           0.000000E+00
      1.220000E-01           5.012400E-01           1.000000E+00
H    P
      7.270000E-01           1.0000000
END
The first s basis in DQC includes the last row with weights 0.50124 while in PySCF, it doesn't include the last row. If you manually change the downloaded basis in DQC to fit the PySCF format, I suspect it will give the same results.

After manually changing the basis, it works like intended. Thank you very much.

diffqc / dqc

[BUG] different density matrix and Basis by using pscf and dqc #2