Open newplay opened 9 months ago
Dear TzuChing, I usually assess the quality of training results by examining the parity plot of Hamiltonian and band_energy on tensorboard. Typically, the model is trained successfully when the loss value of Hamiltonian is below 0.0001. To view the training results, you can use the command 'tensorboard --logdir train_dir(The path of the training results.)'.
Here is an example config.yaml for the second training :
dataset_params:
batch_size: 1
split_file: null
test_ratio: 0.1
train_ratio: 0.8
val_ratio: 0.1
graph_data_path: /home5/zjlin/ML_work/HamGNN/Bilayer_TMD/work_dir/dataset/graph/graph_data_0.npz # Directory where graph_data.npz is located
losses_metrics:
losses:
- loss_weight: 1.0
metric: mae
prediction: hamiltonian
target: hamiltonian
- loss_weight: 0.01
metric: mae
prediction: band_energy
target: band_energy
metrics:
- metric: mae
prediction: Hamiltonian
target: Hamiltonian
- metric: mae
prediction: band_energy
target: band_energy
...default config...
# Generally, the optim_params module only needs to set the initial learning rate (lr)
optim_params:
lr: 0.0001
lr_decay: 0.5
lr_patience: 5
gradient_clip_val: 0.0
max_epochs: 3000
min_epochs: 100
stop_patience: 30
output_nets:
output_module: HamGNN_out
HamGNN_out:
ham_only: true # true: Only the Hamiltonian H is computed; 'false': Fit both H and S
ham_type: openmx # openmx: fit openmx Hamiltonian; abacus: fit abacus Hamiltonian
nao_max: 26 # The maximum number of atomic orbitals in the data set, which can be 14, 19 or 26
add_H0: true # Generally true, the complete Hamiltonian is predicted as the sum of H_scf plus H_nonscf (H0)
symmetrize: true # if set to true, the Hermitian symmetry constraint is imposed on the Hamiltonian
calculate_band_energy: True # Whether to calculate the energy bands to train the model
num_k: 5 # When calculating the energy bands, the number of K points to use
band_num_control: 5 # `dict`: controls how many orbitals are considered for each atom in energy bands; `int`: [vbm-num, vbm+num]; `null`: all bands
k_path: null # `auto`: Automatically determine the k-point path; `null`: random k-point path; `list`: list of k-point paths provided by the user
soc_switch: false # if true, fit the SOC Hamiltonian
nonlinearity_type: norm # norm or gate
profiler_params:
progress_bar_refresh_rat: 1
train_dir: /home5/zjlin/ML_work/HamGNN/Bilayer_TMD/work_dir/train_model/Bilayer_TMD #The folder for saving training information and prediction results. This directory can be read by tensorboard to monitor the training process.
...default config...
setup:
GNN_Net: HamGNN_pre
accelerator: null
ignore_warnings: true
checkpoint_path: /home5/zjlin/ML_work/HamGNN/Bilayer_TMD/work_dir/train_model/Bilayer_TMD/network_weights_bilayer_TMD.ckpt # Path to the model weights file
load_from_checkpoint: True
resume: false
num_gpus: null # null: use cpu; [i]: use the ith GPU device
precision: 32
property: hamiltonian
stage: fit # fit: training; test: inference
Best wishes, Yang Zhong
Thank you for your response. It helped me gain a better understanding of certain aspects. However, I still have some questions regarding the twist_bilayer_MoS2
demo:
bilayer_MoS2
demo?In my database, I gradually adjusted the interlayer distance to the original interlayer distance minus 0.6 angstroms (equivalent to adding pressure), and I suspect that this alteration may be the reason for the difficulty in achieving convergence during training. However, I am unsure why adding pressure to the structure would cause problems in GNN. That's the reason I want to know in question 2.
Best regards, TzuChing
Dear TzuChing,
Details of the bilayer_MoS2
demo can be found in my article:
"Before predicting the electronic structure of the Moiré twisted bilayer MoS2 superlattice, we trained a HamGNN model using a dataset consisting of 500 untwisted bilayer MoS2 structures, each containing 54 atoms. Each MoS2 bilayer structure in the dataset has a random interlayer sliding distance of up to 2 angstroms along a random direction. The layer spacing of each MoS2 bilayer structure in the dataset was randomly shifted by a maximum of 0.5 angstroms."
Maybe some of the structures in your training set are not reasonable. I suggest that you conduct some checks on the structures in the training set, such as examining the interatomic distance.
Best wishes, Yang Zhong
Dear Yang Zhong: Thank you for your reply; it seems I have overlooked too many details,I'm sorry about that. I will re-read your paper to ensure I haven't missed any nuances regarding the usage methods. Best regards, TzuChing
Dear Yang Zhong,
I noticed that the training data you used for MoS2 bilayer appears to be a 3x3 supercell as you mentioned. However, the lattice constant of this supercell is approximately twice as large as that of a normal supercell(a,b axis). Could you please provide some insight into the reason behind this?
Thank you.
ps:
your structure I read from the MoS2_bilayer_graph_data.npz
:
my WSe2 data
:
Dear Yang Zhong,
I noticed that the training data you used for MoS2 bilayer appears to be a 3x3 supercell as you mentioned. However, the lattice constant of this supercell is approximately twice as large as that of a normal supercell(a,b axis). Could you please provide some insight into the reason behind this?
Thank you. ps: your structure I read from the
MoS2_bilayer_graph_data.npz
: myWSe2 data
:
sorry , I find that the unit is au
, not angstrom
.
Dear Yang Zhong,
I have tried the new database, and it seems that the new dataset can improve the mean absolute error (MAE) more effectively. Therefore, I believe the problem may arise from the distance between atoms (in bilayer 2D materials, I suspect it's due to the constraints of van der Waals forces).
Another issue pertains to the readme.md file. You mentioned that we should perform the second training for the band energy to ensure a smaller error:
I have a question: What is the permissible value for the loss in the second training?
Additionally, could I request more details about the
twist_bilayer_MoS2
demo? I believe there may be more overlap between this demo and my project. In my project, I am currently aiming to train thetwist_bilayer_$WSe_2$
first. I have used 500 data points for the training process, and I have the graph_data.npz file of my training database below.My data is generated using the following procedure:
Best regards, TzuChing