Xiaoxun-Gong / DeepH-E3

MIT License
60 stars 16 forks source link

Inquiry about the training #6

Closed keysongkang closed 10 months ago

keysongkang commented 1 year ago

Hello, I'm currently seeking assistance with the training step. My approach involves using 100 samples of silicon structure from aiMD as the training data, with the inputs set to default. However, the predicted results from the trained model are still quite distant from the actual outcomes. If anyone has any suggestions or insights at this stage, I would greatly appreciate your comments and input. Any help would be sincerely valued.

image

Here are some notes. The train loss and val loss of the best model are 1.68e-05 and 1.81e-05. Now, the update of the best model takes almost 500 steps.

Xiaoxun-Gong commented 1 year ago

Hi. The train loss and validation loss look quite large. Maybe you should try training for some more time and see whether it is still possible for the losses to decrease. Moreover, it looks like that you are plotting the band structure over a very large energy range. I would recommend that you focus on a small energy range (no more than about 10 eV) near the Fermi surface.

keysongkang commented 1 year ago

The train loss and validation loss look quite large. Maybe you should try training for some more time and see whether it is still possible for the losses to decrease

Thank you for your comment! I was wondering, what would be considered a reasonable amount of small loss? When I monitored it, I noticed that the loss improvement became quite slow after reaching 2.0e-05. It took nearly 500 steps to achieve a better loss value. Is this normal or common in similar situations?

Moreover, it looks like that you are plotting the band structure over a very large energy range. I would recommend that you focus on a small energy range (no more than about 10 eV) near the Fermi surface.

Thank you for bringing that up. The reason I plotted it over a large energy range is that finding similarity near the Fermi level can be challenging. Regarding your suggestion, are you recommending the use of sparse_calc.jl? Alternatively, do you think it's possible to train the model specifically for states near the Fermi level? I'd love to hear your thoughts on the best approach.

Xiaoxun-Gong commented 1 year ago

Hi. Usually, a MAE loss of about no more than 5e-6 eV2 will be satisfying for predicting a satisfying band structure. In the examples demonstrated by the DeepH-E3 paper, the losses are usually less than 2e-6 eV2, and the training took several thousand epochs. In your case, maybe you should wait more time and see if the model continues to improve. You might also try modifying network hyperparameters (e.g., adjusting learning rate and its decaying strategy, or increasing irreps_mid).

About sparse_calc.jl: it utilizes the sparsity of the Hamiltonian and only diagonalizes a few states near the Fermi level. It will produce completely the same results with those of a full diagonalization.

DeepH-E3 does not have the functionality to train specifically for states near the Fermi level. One possible option is to use the pseudopotential technique instead of an all-electron calculation to produce a pseudo-Hamiltonian that describes the low-energy physics near the Fermi level.

keysongkang commented 1 year ago

Thank you for the comments! I've achieved 5.87e-06, improving predictions. Valence bands are predicted well, but the conduction bands still need more training. I'm hopeful that longer epochs will address this.

image