deepmodeling / deepks-kit

a package for developing machine learning-based chemically accurate energy and density functional models
GNU Lesser General Public License v3.0
103 stars 35 forks source link

Refactor and new function suggestions. #82

Open ErjieWu opened 5 days ago

ErjieWu commented 5 days ago

The current version of deepks-kit is hard to use and maintain, here are some problems & suggestions for improvement.

New functions for users

  1. Transition of final model. When the trainning process is finished for deepks+abacus, there should be a transition from model.pth to model.pth in the final step, since the final model is the most frequently used one for latter works.
  2. The visualization of scf process. In deepks-abacus, almost the vast majority of the time in the whole iteration is spent in the scf process. However, one can only see the general progress of iteration by watching RECORD file. And the tag_0_finished files are only generated in the init step, which make it quite difficult to check the trainning process. Accordingly, there should be a convenient way to stop running and restart at any point.
  3. The mpi/openmp parallelization of deepks-kit running.
  4. Check of data files in .npy. There should be a test to check the size of each npy file at the very beginning of running, which could lessen the tedious checking works made by users theirselves.
  5. A function to automatically spilt a whole dataset to train set and test set. Currently, users is required to prepare separate npy files for trainning and testing, which brings additional works. It's better to add a function and input parameter for users to split the dataset in ways they prefer directly in deepks-kit.
  6. Update of docs. Both user docs and developer docs should be updated.
  7. Compact input file. The number of input files is too large, and the parameter list is too long. Users may only need to modify a few parts in actual use, thus it is better to modify the reference file of the input file to retain only the necessary parameters, and put the complete parameter list and explanation in the user document.
  8. Dependence update. The current deepks-kit does not support newest version of ruamel-yaml and numpy.

Refactor suggestions

  1. File structure optimization. At present, the outermost structure is relatively clear, but the specific implementation of each file contains too many functions, resulting in a lot of file content is very long, contains too much content, inconvenient maintenance. It is recommended to separate utils folders and files based on functionality. (For example, train.py contains all training related functions and classes, the class implementation should be split out into a separate file like evaluator.py, etc.)
  2. Independent default value files. Currently, function realizations and default value settings (capital naming variables) are written together in different files. It's better to combile all default value lists into one file, which makes it easy to maintain in the future.
  3. Make the function and usage of some functions more clear. Some functions integrate multiple functions, but only by the type of input parameters, which makes it difficult to tell what function is used in the actual application of these functions, such as check_share_folder(), such functions need to be rethought for the more reasonable implementation.
  4. Simplify some functions. Some of the functions, such as gather_stats_abacus(), are written in long segments but for similar operations, it's better to simplify.
  5. Add the necessary comments and headers.

Bugs

  1. Support for pyscf. Both the master branch and the develop branch do not support the newest pyscf. Whether to continue the support for pyscf should be taken into concern.
y1xiaoc commented 5 days ago

Thanks for the suggestions. Currently my time is occupied by some other projects so the maintenance of deepks is postponed. Some of the suggestions seems to be very easy to implement so I would encourage you to make a PR and contribute!