Xiaoxun-Gong / DeepH-E3

MIT License
71 stars 17 forks source link

Looking for help to improve the training process. #8

Open keysongkang opened 1 year ago

keysongkang commented 1 year ago

Hello everyone, I am seeking help to improve my training process. I have worked for Silicon as an example, and I want to enhance prediction accuracy. Here are the details of my training setup.

I used 500 Silicon supercells with 64 atoms taken from an aiMD trajectory at 300 K. Besides "dtype = double," I just used the default settings. After training, I got the validation error of 1.12e-06 until the epoch of 2000. (Validation error of another training is 6.98e-07, but it gave even worse results.) I tested it on a larger Silicon supercell with 512 atoms taken from an aiMD trajectory at 300 K. The results are as follows:

image

Should I use larger cells to train the model? Should I use a larger number of supercells to train the model? Should I make any changes to the settings in the train.ini file? I am genuinely seeking ways to improve the prediction accuracy of my model. If you have any suggestions, it would be greatly appreciated. Thank you!

The details of the train.ini file

device = cuda dtype = double save_dir = /u/kkang/scratch/3_ALmoMD/a_Si/7_MD_run/temp/300/deephe3_new additional_folder_name = trained simplified_output = True seed = 42 checkpoint_dir = /u/kkang/scratch/3_ALmoMD/a_Si/7_MD_run/temp/300/deephe3_new/2023-09-28_06-20-58_trained/best_model.pkl use_new_hypp = True

graph_dir = DFT_data_dir = processed_data_dir = /u/kkang/scratch/3_ALmoMD/a_Si/7_MD_run/temp/300/deephe3_new/database/proceeded save_graph_dir = /u/kkang/scratch/3_ALmoMD/a_Si/7_MD_run/temp/300/deephe3_new/database/graph target_data = hamiltonian dataset_name = silicon get_overlap = False

num_epoch = 300000 batch_size = 1 extra_validation = [] extra_val_test_only = True train_ratio = 0.67 val_ratio = 0.33 test_ratio = 0 train_size = -1 val_size = -1 test_size = -1 min_lr = 3e-5 learning_rate = 0.002 Adam_betas = (0.9, 0.999)

scheduler_type = 1 scheduler_params = (factor=0.5, cooldown=40, patience=120, threshold=0.05, verbose=True) revert_decay_patience = 20 revert_decay_rate = 0.8 target = hamiltonian target_blocks_type = all target_blocks = selected_element_pairs = convert_net_out = False cutoff_radius = 7.2 only_ij = False spherical_harmonics_lmax = 4 spherical_basis_irreps = irreps_embed = 64x0e irreps_mid = 64x0e+32x1o+16x2e+8x3o+8x4e num_blocks = 3 ignore_parity = False

irreps_embed_node = irreps_edge_init = irreps_mid_node = irreps_post_node = irreps_out_node = irreps_mid_edge = irreps_post_edge = out_irreps =

Xiaoxun-Gong commented 11 months ago

Hi, sorry for the late reply. I think your input looks fine. Except that you are using double instead of float, and I would suggest that you switch back to float because using double precision slows down the training quite much without giving significant improvement on accuracy.

Besides that, have you checked whether the model can predict accurate bands on the structures in the training set (i.e., the 64-atoms structures)?

  1. If 64-atoms structure is fine but 512-atoms is not, then I would suggest increase your supercell size in training set.
  2. If 64-atoms structure is not good, then you might consider increasing the neural network size, e.g. increasing irreps_mid to, maybe, 128x0e+128x1o+128x2e+64x3o+64x4e, if memory and training time is still acceptable. If increasing neural network size does not work, then you might need to check whether there are any problem in the DFT calculation for your training set and the 512-atoms structure.

Hope this would solve your problem!

keysongkang commented 11 months ago

Thank you so much for your suggestions! I will test them!!