QuantumLab-ZY / HamGNN

An E(3) equivariant Graph Neural Network for predicting electronic Hamiltonian matrix
GNU General Public License v3.0
63 stars 15 forks source link

Some problem of 2nd training processed #9

Open newplay opened 9 months ago

newplay commented 9 months ago

Dear Yang Zhong,

I have tried the new database, and it seems that the new dataset can improve the mean absolute error (MAE) more effectively. Therefore, I believe the problem may arise from the distance between atoms (in bilayer 2D materials, I suspect it's due to the constraints of van der Waals forces).

Another issue pertains to the readme.md file. You mentioned that we should perform the second training for the band energy to ensure a smaller error:

Details of training for bands (The 2nd training step) When the training of the Hamiltonian matrix is completed in the first step, it is necessary to use the trained network weights > to initialize the HamGNN network and start training for the energy bands. The parameters related to energy band training are as follows:

The checkpoint_path parameter should be set to the path of the weight file obtained after training on the Hamiltonian matrix in the first step. Set load_from_checkpoint to True.

  • The learning rate (lr) should not be too large; it is recommended to use 0.0001.
  • In losses_metrics and metrics, remove the commented section for band_energy.
  • Set calculate_band_energy to True and specify the parameters num_k, band_num, and k_path.

I have a question: What is the permissible value for the loss in the second training?

Additionally, could I request more details about the twist_bilayer_MoS2 demo? I believe there may be more overlap between this demo and my project. In my project, I am currently aiming to train the twist_bilayer_$WSe_2$ first. I have used 500 data points for the training process, and I have the graph_data.npz file of my training database below.

My data is generated using the following procedure:

  1. Create the structure from a unit cell monolayer to a $3 \times 3$ supercell.
  2. Separate the supercell into a $10 \times 10$ zone of the a-axis and b-axis.
  3. Create the 2nd layer.(Create the bilayer structure)
  4. Move the 2nd layer step by step as described in step 2.
  5. Add some perturbation to each atom after step 4.

Best regards, TzuChing

QuantumLab-ZY commented 9 months ago

Dear TzuChing, I usually assess the quality of training results by examining the parity plot of Hamiltonian and band_energy on tensorboard. Typically, the model is trained successfully when the loss value of Hamiltonian is below 0.0001. To view the training results, you can use the command 'tensorboard --logdir train_dir(The path of the training results.)'.

Here is an example config.yaml for the second training :

  batch_size: 1
  split_file: null
  test_ratio: 0.1
  train_ratio: 0.8
  val_ratio: 0.1
  graph_data_path: /home5/zjlin/ML_work/HamGNN/Bilayer_TMD/work_dir/dataset/graph/graph_data_0.npz # Directory where graph_data.npz is located

  - loss_weight: 1.0
    metric: mae
    prediction: hamiltonian
    target: hamiltonian
  - loss_weight: 0.01
    metric: mae
    prediction: band_energy
    target: band_energy

  - metric: mae
    prediction: Hamiltonian
    target: Hamiltonian
  - metric: mae
    prediction: band_energy
    target: band_energy

...default config...

# Generally, the optim_params module only needs to set the initial learning rate (lr)
  lr: 0.0001
  lr_decay: 0.5
  lr_patience: 5
  gradient_clip_val: 0.0
  max_epochs: 3000
  min_epochs: 100
  stop_patience: 30

  output_module: HamGNN_out
    ham_only: true # true: Only the Hamiltonian H is computed; 'false': Fit both H and S
    ham_type: openmx # openmx: fit openmx Hamiltonian; abacus: fit abacus Hamiltonian
    nao_max: 26 # The maximum number of atomic orbitals in the data set, which can be 14, 19 or 26
    add_H0: true # Generally true, the complete Hamiltonian is predicted as the sum of H_scf plus H_nonscf (H0)
    symmetrize: true # if set to true, the Hermitian symmetry constraint is imposed on the Hamiltonian
    calculate_band_energy: True # Whether to calculate the energy bands to train the model
    num_k: 5 # When calculating the energy bands, the number of K points to use
    band_num_control: 5 # `dict`: controls how many orbitals are considered for each atom in energy bands; `int`: [vbm-num, vbm+num]; `null`: all bands
    k_path: null # `auto`: Automatically determine the k-point path; `null`: random k-point path; `list`: list of k-point paths provided by the user
    soc_switch: false # if true, fit the SOC Hamiltonian
    nonlinearity_type: norm # norm or gate

  progress_bar_refresh_rat: 1
  train_dir: /home5/zjlin/ML_work/HamGNN/Bilayer_TMD/work_dir/train_model/Bilayer_TMD  #The folder for saving training information and prediction results. This directory can be read by tensorboard to monitor the training process.

...default config...

  GNN_Net: HamGNN_pre
  accelerator: null
  ignore_warnings: true
  checkpoint_path: /home5/zjlin/ML_work/HamGNN/Bilayer_TMD/work_dir/train_model/Bilayer_TMD/network_weights_bilayer_TMD.ckpt # Path to the model weights file
  load_from_checkpoint: True
  resume: false
  num_gpus: null # null: use cpu; [i]: use the ith GPU device
  precision: 32
  property: hamiltonian
  stage: fit # fit: training; test: inference

Best wishes, Yang Zhong

newplay commented 8 months ago

Thank you for your response. It helped me gain a better understanding of certain aspects. However, I still have some questions regarding the twist_bilayer_MoS2 demo:

  1. How many datas were used for training in the bilayer_MoS2 demo?
  2. How did you generate the data for the perturbation structure? Were perturbations applied along the c-axis?

In my database, I gradually adjusted the interlayer distance to the original interlayer distance minus 0.6 angstroms (equivalent to adding pressure), and I suspect that this alteration may be the reason for the difficulty in achieving convergence during training. However, I am unsure why adding pressure to the structure would cause problems in GNN. That's the reason I want to know in question 2.

Best regards, TzuChing

QuantumLab-ZY commented 8 months ago

Dear TzuChing,

Details of the bilayer_MoS2 demo can be found in my article:

"Before predicting the electronic structure of the Moiré twisted bilayer MoS2 superlattice, we trained a HamGNN model using a dataset consisting of 500 untwisted bilayer MoS2 structures, each containing 54 atoms. Each MoS2 bilayer structure in the dataset has a random interlayer sliding distance of up to 2 angstroms along a random direction. The layer spacing of each MoS2 bilayer structure in the dataset was randomly shifted by a maximum of 0.5 angstroms."

Maybe some of the structures in your training set are not reasonable. I suggest that you conduct some checks on the structures in the training set, such as examining the interatomic distance.

Best wishes, Yang Zhong

newplay commented 8 months ago

Dear Yang Zhong: Thank you for your reply; it seems I have overlooked too many details,I'm sorry about that. I will re-read your paper to ensure I haven't missed any nuances regarding the usage methods. Best regards, TzuChing

newplay commented 8 months ago

Dear Yang Zhong,

I noticed that the training data you used for MoS2 bilayer appears to be a 3x3 supercell as you mentioned. However, the lattice constant of this supercell is approximately twice as large as that of a normal supercell(a,b axis). Could you please provide some insight into the reason behind this?

Thank you. ps: your structure I read from the MoS2_bilayer_graph_data.npz: image image my WSe2 data: image image

newplay commented 8 months ago

Dear Yang Zhong,

I noticed that the training data you used for MoS2 bilayer appears to be a 3x3 supercell as you mentioned. However, the lattice constant of this supercell is approximately twice as large as that of a normal supercell(a,b axis). Could you please provide some insight into the reason behind this?

Thank you. ps: your structure I read from the MoS2_bilayer_graph_data.npz: image image my WSe2 data: image image

sorry , I find that the unit is au , not angstrom .