Training a model from an already built .pb file

siddarthachar commented 3 years ago

Hello,

I have been trying to fine tune my dp model with some active learning iterations. I have a .pb file that was generate from a regular deepmd training (without dpgen). Now, I would want to use the same .pb file and train it with further MD runs in DPGEN for different operating conditions.

I was a bit confused with the documentation. What I understood is that I would have to set a path to my previous training meta-data files (which contain the model.ckpt* files) which would be my "training_iter0_model_path" and then I would have to set "training_init_model" to be true. This would then use the weight and bias parameters as that of the model at "training_iter0_model_path" which would be the first model for iter0. Is this correct?

Another question, is there a way to bypass the init_data_sys part if I already have an initial model ready? With this I could save some time with running DFT-MD simulations.

I hope to hear from you guys soon.

Thanks in advance.

Best Siddarth

fqgong commented 3 years ago

Hello,

The training init model is a way to train the model based on the model you have already trained, in which way it can save a lot of time of training. It will inherit the knowledge of old model and learn the knowledge of new data. Specifically speaking, init model will use the parameters of old model to initialize current model. So what you understand is pretty much close. But actually, you don't have to set training_iter0_model_path if the iteration number of your training is larger than 0. It will automatically use the model of previous iterations. As for your second question, if you already have had four models trained from the same data sets, you can skip the trainig step of the first iterations by write record.dpgen file like

0 0

0 1

0 2

By this way, dpgen will start from the exploration step of the first iterations. But you should make sure that the directory structure is correct something like iter.000000/00.train/graph.00[0-3].pb .

siddarthachar commented 3 years ago

Hello,

Thanks a lot for your response. I think I managed to get how to make the dpgen run using an old model. Thanks a lot for your help. However, I still have few other questions though:

Lets say that I have a single model built previously that I now want to improve with more PES exploration with active learning. Would making 4 copies of the same model be a good idea? I tried doing this and the deviations are all 0s. And if the deviations are all 0s will there be no data frames that go into the candidate list for re-labeling?
Lets that I ask dpgen to run a LAMMPS simulation for 10000 steps of NVT at some temperature for data generation at iter.00x. But I only care about the last 5000 steps of the simulation because that is when the system equilibrates. Is there are way to specify this in the run param json file while defining the iterations?
Are there other ways to explore the PES with dpgen other than just nvt or npt MD simulations in LAMMPS?
How do we know if the upper and lower limit on the force error is correct ie:model_devi_f_trust_lo/hi? Would it make physical sense to make the bound tighter?

Thanks a lot for all your.

Best, Siddarth

fqgong commented 3 years ago

The reason why the deviations are all 0s is that four models you have trained are exactly the same. Actually, in DP-GEN, during the training step, it will use four different random seeds to generate random numbers. So even if your datasets of training are all the same, the potentials are still slighltly different with each other. If you want to train models manually, it will be better use different random seeds to generate different potentials but from same datasets.
You can do this by setting a key which is model_devi_skip to 5000. It is in the DP-GEN Manual.
Theoretically speaking, all kinds of simulation that LAMMPS is able to do can be fulfilled by DP-GEN. All you need is to prepare an input template of LAMMPS. And add flowing keys to your param.json. The template is used to specify the path of your input, and revmat is used to specify the value of variables which are defined in the input of LAMMPS.

"sys_idx": [0],"traj_freq": 10,"_idx": "00",

"template": { "lmp": "lmp/input.lammps" },

"rev_mat": { "lmp": {"V_NSTEPS": [20000], "V_TEMP": [300], "V_PRES": [1]} }
Normally, I would set the lower limit to the value close to the train error of potential, which can guarantee that the data frames have already been in your datasets won't be labeled again. And set the higher limit to the value which is 3 to 5 times the lower value. I think making the bound tighter is fine. The higher limit is actually to rule out highly unreliable even unphysical structures.

deepmodeling / dpgen

Training a model from an already built .pb file #349