ICAMS / python-ace

Other
57 stars 15 forks source link

lost atom when runing lammps #40

Closed YuanbinLiu closed 1 year ago

YuanbinLiu commented 1 year ago

Hi authors. I have used pacemaker to train a pace potential for disordered materials. The training and testing accuracy is impressive. But when I actually run LAMMPS with pace, I encounter a problem where atoms are easily lost. Do you have any suggestions on how to solve this issue? For instance, could it be possible to add some sort of regularization to the training process of pace?

YuanbinLiu commented 1 year ago

The temperatures are baffling:

Per MPI rank memory allocation (min/avg/max) = 5.586 | 5.59 | 5.595 Mbytes Step Temp Press Volume KinEng PotEng TotEng
0 1800 869377.6 103204.61 1163.1086 7417.0035 8580.1121
1 2975.3213 854855.43 103215.78 1922.5677 6632.2598 8554.8275
2 6106.4842 828076.3 103248.95 3945.8358 4567.9114 8513.7473
3 10566.644 798196.06 103303.43 6827.8636 1622.0949 8449.9584
4 15614.337 754512.63 103378.47 10089.539 -1717.9616 8371.5771
5 20515.741 705538.89 103472.91 13256.686 -4999.2138 8257.4725
6 24679.404 656861.91 103585.38 15947.126 -7807.5973 8139.529
7 28033.111 609010.73 103714.41 18114.196 -10137.972 7976.2241
8 30554.359 564492.59 103858.41 19743.355 -11996.171 7747.1843
9 32326.447 522184.68 104015.7 20888.428 -13331.638 7556.7898
10 33471.094 487973.35 104184.52 21628.066 -14247.182 7380.8833
11 33976.611 461220.57 104363.13 21954.716 -14725.713 7229.0036
12 34220.033 441755.22 104549.85 22112.009 -15132.674 6979.3344
13 37875.47 439401.92 104743.05 24474.048 -15383.945 9090.1031
14 48919.388 518845.29 104941.47 31610.313 -15102.828 16507.485
15 48346.918 502931.29 105145.91 31240.399 -15152.229 16088.17
16 47519.682 493868.93 105354.55 30705.863 -15040.688 15665.175
17 46770.243 488719.77 105565.76 30221.597 -15039.91 15181.686
18 46085.505 483373.5 105778.08 29779.138 -15014.446 14764.692
19 45349.457 476529.37 105990.13 29303.525 -14968.712 14334.813
20 44703.967 471571.49 106200.68 28886.428 -15003.592 13882.836
21 44049.033 461386.59 106408.75 28463.228 -15009.74 13453.489
22 43429.863 457626.31 106613.46 28063.138 -15069.319 12993.819
23 42855.542 451517.54 106814.34 27692.028 -15150.772 12541.256
24 42308.141 446803.39 107011.1 27338.313 -15217.134 12121.179
25 41455.552 440710.37 107203.7 26787.395 -15240.057 11547.337
26 40631.927 431569.85 107392.24 26255.192 -15385.324 10869.868
27 39952.16 421839.58 107576.9 25815.946 -15420.572 10395.374
28 39393.536 415096.53 107757.94 25454.979 -15503.746 9951.2331
29 38898.173 419861.94 107935.8 25134.889 -15528.383 9606.5065
30 38392.816 407813.31 108111.25 24808.342 -15524.145 9284.1967
31 37918.567 402349.8 108284.58 24501.896 -15634.202 8867.6947
ERROR: Lost atoms: original 5000 current 4998 (src/thermo.cpp:481) Last command: run 5000

yury-lysogorskiy commented 1 year ago

You probably have a problem with short-range repulsion (or lacking of that) To be sure about that: 1) compute min distance in the simulation cell during run with LAMMPS:

compute dist all pair/local dist 
compute  min_dist all reduce  min c_dist
thermo_style    custom step temp pe ke etotal vol press fmax c_min_dist ...

2) Visually inspect the trajectory. Probably you will see atoms that are stuck together

There are few options how to tackle that: 1) (most preferential) fix it through the data, i.e. you should have/add in the training set configruations with short interatomic distances 2) Add core-repulsion and inner cutof BEFORE fit 3) Add core-repulsion and inner cutof AFTER fit For last two points, check this: https://pacemaker.readthedocs.io/en/latest/pacemaker/faq/#my_potential_behaves_unphysical_at_short_distances_how_to_fix_it

Namely for p.3 try

from pyace import *

bbasisconf = BBasisConfiguration("original_potential.yaml")

for block in bbasisconf.funcspecs_blocks:
    block.r_in = 2.3 # minimal interatomic distance in dataset
    block.delta_in = 0.1
    block.core_rep_parameters=[1e3, 1.0]
    block.rho_cut = block.drho_cut = 5
bbasisconf.save("tuned_potential.yaml")

where block.r_in = 2.3 should be the characteristic inner cutoff for you material, where you do expect strong repulsion to start

YuanbinLiu commented 1 year ago

I have tried to include the dimer structures (min_distance=1.0A) and add core-repulsion and inner cutoff. The situation has seen some improvement, however, the Machine Device (MD) is still extremely unstable. For instance, it cannot maintain the target temperature consistently. I have attached the log file from lammps and training files. Here is my setting for core repulsion: block.r_in = 1.6 block.delta_in = 0.5 block.core_rep_parameters=[1e3, 1.0] block.rho_cut = 100000 block.drho_cut = 250

106738.log

ace_training.zip

yury-lysogorskiy commented 1 year ago

1) In your LAMMPS log, it looks strange that your system has very high initial force max Fmax=386.36808 eV/A and pressure=240 GPa. It is better (for any potential being used) to minimize the system from such extreme condition before doing MD.:

### MINIMIZATION ###
fix box_relax all box/relax aniso 0.0 vmax 0.001

min_style cg
minimize 0 1.0e-3 5000 5000

unfix box_relax

### MD ###
...

If that not helps go to p.2

2) Fitting: In your log.txt train Energy_low and Force_low looks too high. I see that you tried to use Energy based weighting, but then switched to uniform weighting. Looking on your train_ef_distribution.png, it is clear that majority of your data is around -4 eV/at.

3) Data: adding just dimer would not help for core-repulsion in bulk. You need to add compressed bulk structures. You can generate them by yourself by just uniformly compressing your cells or select them from MD by using active learning

YuanbinLiu commented 1 year ago

Hi Yury. Following your suggestions, the problem has been resolved. Thank you!