QuantumLab-ZY / HamGNN

An E(3) equivariant Graph Neural Network for predicting electronic Hamiltonian matrix
GNU General Public License v3.0
55 stars 15 forks source link

Table of Contents

Introduction to HamGNN

The HamGNN model is an E(3) equivariant graph neural network designed for the purpose of training and predicting tight-binding (TB) Hamiltonians of molecules and solids. Currently, HamGNN can be used in common ab initio DFT software that is based on numerical atomic orbitals, such as OpenMX, Siesta, and Abacus. HamGNN supports predictions of SU(2) equivariant Hamiltonians with spin-orbit coupling effects. HamGNN not only achieves a high fidelity approximation of DFT but also enables transferable predictions across material structures, making it suitable for high-throughput electronic structure calculations and accelerating computations on large-scale systems.

Requirements

We recommend using the Python 3.9 interpreter. HamGNN needs the following python libraries:

Python libraries

The user can create a Python environment for HamGNN using conda env create -f environment.yaml.

A another way to set up the Python environment for HamGNN is to use the HamGNN conda environment I have uploaded to this website. Users can simply extract this conda environment directly to their own conda/envs directory.

OpenMX

HamGNN aims to fit the TB Hamiltonian generated by OpenMX. The user need to know the basic OpenMX parameters and how to use them properly. OpenMX can be downloaded from this site.

openmx_postprocess

openmx_postprocess is a modified OpenMX package used for computing overlap matrices and other Hamiltonian matrices that can be calculated analytically. The data computed by openmx_postprocess will be stored in a binary file overlap.scfout. The installation and usage of openmx_postprocess is essentially the same as that of OpenMX. To install openmx_postprocess, you need to install the GSL library first.Then enter the openmx_postprocess directory and modify the following parameters in the makefile:

After modifying the makefile, you can directly execute the make command to generate two executable programs, openmx_postprocess and read_openmx.

read_openmx

read_openmx is a binary executable that can be used to export the matrices from the binary file overlap.scfout to a file called HS.json.

Installation

Run the following command to install HamGNN:

git clone https://github.com/QuantumLab-ZY/HamGNN.git
cd HamGNN
python setup.py install

Please note that if you have previously installed an older version of HamGNN, you need to uninstall the old version first by using pip uninstall HamGNN. After uninstalling, there may still be a residual installation package in the site-packages directory named 'HamGNN-x.x.x-py3.9.egg/HamGNN'. In this case, you need to manually delete this directory before installing the new version of HamGNN; otherwise, the new version of HamGNN may call functions from the old one.

Usage

Preparation of Hamiltonian Training Data

First, generate a set of structure files (POSCAR or CIF files) using molecular dynamics or random perturbation. After setting the appropriate path parameters in the poscar2openmx.yaml file, run poscar2openmx --config path/to/the/poscar2openmx.yaml to convert these structures into OpenMX's .dat file format. Run OpenMX to perform static calculations on these structure files and obtain the .scfout binary files, which store the Hamiltonian and overlap matrix information for each structure. These files serve as the target Hamiltonians during training. Next, run openmx_postprocess to process each structure and obtain the overlap.scfout file, which contains the Hamiltonian matrix H0 that is independent of the self-consistent charge density. If the constructed dataset is only used for prediction purposes and not for training (i.e., no target Hamiltonian is needed), run openmx_postprocess to obtain the overlap.scfout file merely. openmx_postprocess is executed similarly to OpenMX and supports MPI parallelism.

Graph Data Conversion

After setting the appropriate path information in a graph_data_gen.yaml file, run graph_data_gen --config graph_data_gen.yaml to package the structural information and Hamiltonian data from all .scfout files into a single graph_data.npz file, which serves as the input data for the HamGNN network.

HamGNN Network Training and Prediction

Prepare the config.yaml configuration file and set the network parameters, training parameters, and other details in this file. To run HamGNN, simply enter HamGNN --config config.yaml. Running tensorboard --logdir train_dir allows real-time monitoring of the training progress, where train_dir is the folder where HamGNN saves the training data, corresponding to the train_dir parameter in config.yaml. To enhance the transferability and prediction accuracy of the network, the training is divided into two steps. The first step involves training with only the loss value of the Hamiltonian in the loss function until the Hamiltonian training converges or the error reaches around 10^-5 Hartree, at which point the training can be stopped. Then, the band energy error is added to the loss function, and the network parameters obtained from the previous step are loaded for further training. After obtaining the final network parameters, the network can be used for prediction. First, convert the structures to be predicted into the input data format (graph_data.npz) for the network, following similar steps and procedures as preparing the training set. Then, in the config.yaml file, set the checkpoint_path to the path of the network parameter file and set the stage parameter to test. After configuring the parameters in config.yaml, running HamGNN --config config.yaml will perform the prediction. Several pre-trained models and the config.yaml file for the test examples are available on Zenodo (https://doi.org/10.5281/zenodo.8147631).

Details of training for bands (The 2nd training/fine-tuning step)

When the training of the Hamiltonian matrix is completed in the first step, it is necessary to use the trained network weights to initialize the HamGNN network and start training for the energy bands. The parameters related to energy band training are as follows:

After setting the above parameters, start the training again.

Band Structure Calculation

Set the parameters in band_cal.yaml, mainly the path to the Hamiltonian data, then run band_cal --config band_cal.yaml

The support for ABACUS software

The utilities to support ABACUS software have been uploaded in the utils_abacus directory. Users need to modify the parameters in the scripts within this directory. The code for abacus_postprocess in utils_abacus/abacus_H0_export is derived from modifying the abacus program based on ABACUS-3.5.3. The function of this tool is similar to openmx_postprocess and it is used to export the Hamiltonian part H0, which is independent of the self-consistent field (SCF) charge density. Compilation of abacus-postprocess is the same as that of the original ABACUS.

poscar2abacus.py and graph_data_gen_abacus.py scripts are respectively utilized for generating ABACUS structure files and packaging the Hamiltonian matrix into the graph_data.npz file. Users can explore the usage of these tools independently. Later on, I'll briefly introduce the meanings of the parameters within these scripts.

Diagonalizing Hamiltonian matrices for large scale systems

For crystal structures containing thousands of atoms, diagonalizing the Hamiltonian matrix using the serial band_cal script can be quite challenging. To address this, we've introduced a multi-core parallel band_cal_parallel script within band_cal_parallel directory. Note: In certain MKL environments, using the band_cal_parallel may trigger a bug that reports the error message 'Intel MKL FATAL ERROR: Cannot load symbol MKLMPI_Get_wrappers'. Users can try the solutions provided in Issues #18 and #12 to resolve this issue (thanks to the help from flamingoXu and newplay).

Installation

pip install mpitool-0.0.1-cp39-cp39-manylinux1_x86_64.whl

pip install band_cal_parallel-0.1.12-py3-none-any.whl

Usage

In the Python environment with band_cal_parallel installed, execute the following command with multiple cpus to compute the band structure: mpirun -np ncpus band_cal_parallel --config band_cal_parallel.yaml

Explanation of the parameters in config.yaml

The input parameters in config.yaml are divided into different modules, which mainly include 'setup', 'dataset_params', 'losses_metrics', 'optim_params' and network-related parameters ('HamGNN_pre' and 'HamGNN_out'). Most of the parameters work well using the default values. The following introduces some commonly used parameters in each module.

Minimum irreps for node and edge features in config.yaml

from e3nn import o3

row=col=o3.Irreps("1x0e+1x0e+1x0e+1x1o+1x1o+1x2e+1x2e") # for 'sssppd'
ham_irreps_dim = []
ham_irreps = o3.Irreps()

for _, li in row:
    for _, lj in col:
        for L in range(abs(li.l-lj.l), li.l+lj.l+1):
            ham_irreps += o3.Irrep(L, (-1)**(li.l+lj.l)) 

print(ham_irreps.sort()[0].simplify())
Output: 17x0e+20x1o+8x1e+8x2o+20x2e+8x3o+4x3e+4x4e

References

The papers related to HamGNN:

[1] Transferable equivariant graph neural networks for the Hamiltonians of molecules and solids

[2] Universal Machine Learning Kohn-Sham Hamiltonian for Materials

[3] Accelerating the electronic-structure calculation of magnetic systems by equivariant neural networks

[4] Topological interfacial states in ferroelectric domain walls of two-dimensional bismuth

[5] Transferable Machine Learning Approach for Predicting Electronic Structures of Charged Defects

Code contributors:

Project leaders: