lost atom when runing lammps

YuanbinLiu commented 1 year ago

Hi authors. I have used pacemaker to train a pace potential for disordered materials. The training and testing accuracy is impressive. But when I actually run LAMMPS with pace, I encounter a problem where atoms are easily lost. Do you have any suggestions on how to solve this issue? For instance, could it be possible to add some sort of regularization to the training process of pace?

YuanbinLiu commented 1 year ago

The temperatures are baffling:

Per MPI rank memory Step Temp Press 0 1800 869377.6 1 2975.3213 854855.43 2 6106.4842 828076.3 3 10566.644 798196.06 4 15614.337 754512.63 5 20515.741 705538.89 6 24679.404 656861.91 7 28033.111 609010.73 8 30554.359 564492.59 9 32326.447 522184.68 10 33471.094 487973.35 11 33976.611 461220.57 12 34220.033 441755.22 13 37875.47 439401.92 14 48919.388 518845.29 15 48346.918 502931.29 16 47519.682 493868.93 17 46770.243 488719.77 18 46085.505 483373.5 19 45349.457 476529.37 20 44703.967 471571.49 21 44049.033 461386.59 22 43429.863 457626.31 23 42855.542 451517.54 24 42308.141 446803.39 25 41455.552 440710.37 26 40631.927 431569.85 27 39952.16 421839.58 28 39393.536 415096.53 29 38898.173 419861.94 30 38392.816 407813.31 31 37918.567 402349.8 ERROR: Lost atoms: original Last command: run 5000 allocation (min/avg/max) = 5.586 | 5.59 | 5.595 Mbytes Volume KinEng PotEng TotEng
103204.61 1163.1086 7417.0035 8580.1121
103215.78 1922.5677 6632.2598 8554.8275
103248.95 3945.8358 4567.9114 8513.7473
103303.43 6827.8636 1622.0949 8449.9584
103378.47 10089.539 -1717.9616 8371.5771
103472.91 13256.686 -4999.2138 8257.4725
103585.38 15947.126 -7807.5973 8139.529
103714.41 18114.196 -10137.972 7976.2241
103858.41 19743.355 -11996.171 7747.1843
104015.7 20888.428 -13331.638 7556.7898
104184.52 21628.066 -14247.182 7380.8833
104363.13 21954.716 -14725.713 7229.0036
104549.85 22112.009 -15132.674 6979.3344
104743.05 24474.048 -15383.945 9090.1031
104941.47 31610.313 -15102.828 16507.485
105145.91 31240.399 -15152.229 16088.17
105354.55 30705.863 -15040.688 15665.175
105565.76 30221.597 -15039.91 15181.686
105778.08 29779.138 -15014.446 14764.692
105990.13 29303.525 -14968.712 14334.813
106200.68 28886.428 -15003.592 13882.836
106408.75 28463.228 -15009.74 13453.489
106613.46 28063.138 -15069.319 12993.819
106814.34 27692.028 -15150.772 12541.256
107011.1 27338.313 -15217.134 12121.179
107203.7 26787.395 -15240.057 11547.337
107392.24 26255.192 -15385.324 10869.868
107576.9 25815.946 -15420.572 10395.374
107757.94 25454.979 -15503.746 9951.2331
107935.8 25134.889 -15528.383 9606.5065
108111.25 24808.342 -15524.145 9284.1967
108284.58 24501.896 -15634.202 8867.6947
5000 current 4998 (src/thermo.cpp:481)

yury-lysogorskiy commented 1 year ago

You probably have a problem with short-range repulsion (or lacking of that) To be sure about that: 1) compute min distance in the simulation cell during run with LAMMPS:

compute dist all pair/local dist 
compute  min_dist all reduce  min c_dist
thermo_style    custom step temp pe ke etotal vol press fmax c_min_dist ...

2) Visually inspect the trajectory. Probably you will see atoms that are stuck together

There are few options how to tackle that: 1) (most preferential) fix it through the data, i.e. you should have/add in the training set configruations with short interatomic distances 2) Add core-repulsion and inner cutof BEFORE fit 3) Add core-repulsion and inner cutof AFTER fit For last two points, check this: https://pacemaker.readthedocs.io/en/latest/pacemaker/faq/#my_potential_behaves_unphysical_at_short_distances_how_to_fix_it

Namely for p.3 try

from pyace import *

bbasisconf = BBasisConfiguration("original_potential.yaml")

for block in bbasisconf.funcspecs_blocks:
    block.r_in = 2.3 # minimal interatomic distance in dataset
    block.delta_in = 0.1
    block.core_rep_parameters=[1e3, 1.0]
    block.rho_cut = block.drho_cut = 5
bbasisconf.save("tuned_potential.yaml")

where block.r_in = 2.3 should be the characteristic inner cutoff for you material, where you do expect strong repulsion to start

YuanbinLiu commented 1 year ago

I have tried to include the dimer structures (min_distance=1.0A) and add core-repulsion and inner cutoff. The situation has seen some improvement, however, the Machine Device (MD) is still extremely unstable. For instance, it cannot maintain the target temperature consistently. I have attached the log file from lammps and training files. Here is my setting for core repulsion: block.r_in = 1.6 block.delta_in = 0.5 block.core_rep_parameters=[1e3, 1.0] block.rho_cut = 100000 block.drho_cut = 250

106738.log

ace_training.zip

yury-lysogorskiy commented 1 year ago

1) In your LAMMPS log, it looks strange that your system has very high initial force max Fmax=386.36808 eV/A and pressure=240 GPa. It is better (for any potential being used) to minimize the system from such extreme condition before doing MD.:

### MINIMIZATION ###
fix box_relax all box/relax aniso 0.0 vmax 0.001

min_style cg
minimize 0 1.0e-3 5000 5000

unfix box_relax

### MD ###
...

If that not helps go to p.2

2) Fitting: In your log.txt train Energy_low and Force_low looks too high. I see that you tried to use Energy based weighting, but then switched to uniform weighting. Looking on your train_ef_distribution.png, it is clear that majority of your data is around -4 eV/at.

Maybe it worth to try EnergyBased weighing again but now use wider energy window, i.e. DElow: 3 and shift DE: 2.0 (https://pacemaker.readthedocs.io/en/latest/pacemaker/inputfile/#fitting_settings)
you could still use more iterations, i.e. 1500
You can try to use two-stage fit to get better forces and energies together: fit with more weights on forces (in loss spec kappa=0.95) then continue fit the potential (upfit) with kappa=0.1..0.2 (https://pacemaker.readthedocs.io/en/latest/pacemaker/faq/#what_is_the_best_value_for_the_relative_force_weight_in_the_loss_function_kappa)

3) Data: adding just dimer would not help for core-repulsion in bulk. You need to add compressed bulk structures. You can generate them by yourself by just uniformly compressing your cells or select them from MD by using active learning

YuanbinLiu commented 1 year ago

Hi Yury. Following your suggestions, the problem has been resolved. Thank you!

ICAMS / python-ace

lost atom when runing lammps #40