DeepSolid code optimization failure for ground state energy calculation of 222 bcc Hydrogen

bytedance / DeepSolid

A library combining solid quantum Monte Carlo and neural network.

Apache License 2.0

31 stars 7 forks source link

DeepSolid code optimization failure for ground state energy calculation of 222 bcc Hydrogen #5

Open DanChai22 opened 1 month ago

DanChai22 commented 1 month ago

Description: I am experiencing issues with the DeepSolid code while calculating the ground state energy of 222 supercell, bcc Hydrogen. Despite trying multiple hyperparameters, such as learning rate, MCMC steps, and batch size, the optimization process consistently fails and results in an explosion when the energy reaches around -0.85 Hartree. Configuration Files:

Environment:

GPUs: 4 A100 GPUs, each with 40GB memory
Platform: Kylin Linux Advanced Server V10 (Sword)

Usage deepsolid --config=PATH/TO/DeepSolid/config/bcc_cell.py:H,2,2,2,1.31,0,ccpvdz

And I tried multiple hyperparamters, including change the learning rate, mcmc steps and batch size, which shows that the DeepSolid code can not optimize as expected. They will suffer an explosion when the energy achieves like -0.85 Hartree, which can be shown below. Observed Behavior: The energy optimization process fails, resulting in an explosion when the energy value reaches approximately -0.85 Hartree, as shown in the graph below.

The benchmark for the structure is about -0.4878 Hartree per atom, which is -0.9756 Hartree per cell, as shown in the benchmark graph below.

GiantElephant123 commented 1 month ago

Hi, sorry for the late response.

I think this situation is related to the metallic feature of BCC H.

Metal is the most challenging system for high-accuracy method because it requires much larger simulation size to reach TDL. Considering this, we are still modifying DeepSolid to study metal.

As for the specific problem, I guess some possible reasons:

There are some singular parameters in neural network for BCC H. If you plot the electron density of BCC-H, I think it will be quite homogeneous, which means the wavefucntion is somehow "trivial". And this may lead to some divergent gradient. As a quick solution, you can try Float64 via --config.use_x64
As you may know, deepsolid requires HF or DFT to supply occupied k points. If supplied k points are wrong, it may cause some problems. Actually, I don't know whether 222 is large enough for HF to get reasonable results for metal. You can check whether HF program get converged via kuhf.stabalize(). see also this link: https://github.com/pyscf/pyscf/blob/master/examples/scf/17-stability.py

Overall, I think much more efforts are needed to study metals and we are still trying.

AllanChain commented 1 month ago

Just to add a little. Regarding the "homogeneous density", it's worthwhile to check pmove to see if it's always larger than 0.5. By default, the pmove will be automatically controlled around 0.5 by automatically adjusting the MCMC move width. But if the system is "homogeneous", the MCMC move width will grow larger and larger trying to keep pmove around 0.5, and this may lead to some instabilities.

DanChai22 commented 1 month ago

Thanks for the comment. For the first point, I set FLOAT64 as default, and I did not calculate the electron density. However, I can attach the pmove below, which behaves quite strange, oscillating around 0.5.

For the second point, I check the stability via hartree_fock = hf.SCF(cell=simulation_cell, twist=jnp.array((0.0, 0.0, 0.0))) hartree_fock.kmf.stability() And the output is

KRHF/KRKS wavefunction is stable in the internal stability analysis

GiantElephant123 commented 1 month ago

So I think pmove maybe the problem. For homogeneous system, pmove may be quite large, near 1. However, we set pmove within 0.5-0.55, if not, the mcmc step width will be tuned correspondingly. For this case, I guess the mcmc step may be tuned too large, which cause the problem. You can modify these lines to modify pmove. https://github.com/bytedance/DeepSolid/blob/master/DeepSolid/process.py#L368.

0.9~0.95 pmove may be a more reasonable value for metal.

DanChai22 commented 1 month ago

Hi,

I've resolved the issue regarding optimization. It turns out that the problem wasn't with the hyperparameters but with the method of constructing the supercell.

Initially, I constructed the supercell by first creating a 1x1x1 original cell (with 2 atoms) and then constructing the supercell using: supercell.get_supercell(original_cell, np.diag([2, 2, 2])) This approach led to an explosion in energy during optimization.

However, I achieved correct optimization by first constructing a 2x2x2 original cell (with 16 atoms) and then constructing the supercell using: supercell.get_supercell(original_cell, np.diag([1, 1, 1])) This way resulted in the correct energy optimization, exceeding an energy of -7.8 Hartree, which matches the energy of the 2x2x2 original cell (or equivalently, -0.4875 Hartree per atom).

I am curious about why the method of constructing the supercell leads to such different results. Is this due to physical reasons or the algorithm? I noticed in your paper that you chose the periodicity of atoms to match the period of the original cell (primitive cell) rather than the simulation cell (supercell), which confused me for a long time. Could this choice be causing the discrepancies in the results?

Thank you very much for your help and insights!

GiantElephant123 commented 1 month ago

Hi,

Glad to see your problem solved. I am quite surprised how this problem is solved. I feel it's quite a subtle problem and I try to explain. I think there is a difference between this two constructions:

build a 111 bcc primitive cell and then build a 222 supercell
build a 222 primitive cell and use it as supercell.

These two constructions lead to different properties of wavefunctions. As you may know, DeepSolid wavefunction has below symmetry: $$\psi(r_1+L_p,...,r_N+L_p)=\exp(ik_p\cdot L_p)\psi(r_1,...,r_N)$$ where $L_p$ is lattice vector of primitive cell and they differ in construction 1 and 2.

However, the fundamental Hamiltonian of the system is translational invariant with respect to the smallest lattice vector (lattice vector of 111 bcc primitive cell in this case).

So there may occur translation symmetry breaking in construction 2 since it only enforce translational symmetry of 222 cell.

The same situation also happens when we calculate hydrogen chain. There are something strange if we use cell of 1 atom, and that's why we build a cell of 2 atoms to calculate hydrogen chain.

However, I don't know the reason why construction 2 has a better results. Really appreciate it if someone could offer a better explain.

GiantElephant123 commented 2 weeks ago

It seems I find the reason. The problem is related to the input periodic feature. Default feature of deepsolid is nu (which means some kind of polynomials), and we have also implemented (sin, cos) features recently. I test bcc-H with (sin, cos) feature and the energy can easily get comparable with sota (222 supercell built from tiling 111 cell).

The underlying reason maybe that we need sin and cos together to describe a point on circle.

You can also test it with a simple flag --config.network.detnet.distance_type tri