deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.46k stars 504 forks source link

[BUG] Training using init-model gives normal lcurve and bad model #3751

Closed zjgemi closed 2 months ago

zjgemi commented 5 months ago

Bug summary

In the DPGen workflow, training in iter-0 seems all right. The model trained in iter-1 (with init-model) has a large RMSE ~100meV, while the lcurve shows a better accuracy 20240506-193625 For the worst system, the RMSE increase by a factor >30 after training of iter-1. 20240506-194007 This phenomenon does not appear when using finetune (instead of init-model) in iter-1.

DeePMD-kit Version

stable-0411

Backend and its version

Pytorch

How did you download the software?

docker

Input Files, Running Commands, Error Log, etc.

iter1input.zip

Steps to Reproduce

bash aefcb166ade9f2faf80a15e8a6f0d0cb70a6d33a.sub

Further Information, Files, and Links

No response

Chengqian-Zhang commented 3 months ago

I followed the steps of

  1. fine-tuning based on the multitask pre-trained model
  2. init-model based on the finetuning model obtained in step1

and found that I am able to reproduce the bug on the stable_0411 branch, but everything works well on the latest devel branch, so you can test if it's still an issue on the latest devel branch.