FF-optimization - Githubissues

qzhu2017 commented 7 months ago

[x] Universal scaling factor for charge
[x] Output the comparison of each parameter (plot)
[x] Add the dump function 5734308
[x] Add more data
[x] Add silicon example
[x] optimize the speed.
[x] add para for mace
[x] Test the performance of large data (eg. 200+ csp data.)
[x] Add the SA algo for opt.
[x] add ANI calculator to check the force accuracy?
[x] make sure the cell size is big enough for lammps
[x] Fix the bug related to atomic ordering in lammps
[x] two-stage FF parameters, 1 for optimization and 2 for refinement
[x] add a filter function to clean up outlier reference structures.

qzhu2017 commented 7 months ago

@shinnosukehattori

qzhu2017 commented 6 months ago

@shinnosukehattori

I am happy to update that the speed has been greatly improved :)

shinnosukehattori commented 6 months ago

Great! I am currently updating to the latest version and trying it out in my environment as well, and have confirmed that the CPU usage is 4000% and running in parallel in a 40 core environment.

shinnosukehattori commented 6 months ago

Opt_obj 14.27 min ['bond'] 6755.817094030283 365 Opt_obj 12.88 min ['angle'] 6001.812671216791 318 Opt_obj 12.05 min ['bond', 'angle'] 5746.516051875931 322 Opt_obj 8.85 min ['proper', 'vdW', 'charge'] 5723.3229295154115 225 Opt_obj 9.39 min ['bond', 'angle', 'proper', 'vdW', 'charge'] 5708.235178005419 214

Results

qzhu2017 commented 6 months ago

@shinnosukehattori Now the results are much more reasonable

shinnosukehattori commented 6 months ago

Great! what is the most important modification to improve forces?

qzhu2017 commented 6 months ago

@shinnosukehattori It turns out this previously observed discrepancy was mainly due to the incorrect atomic ordering when we read the atomic forces from the lammps's output.

shinnosukehattori commented 6 months ago

Thanks, I also try to optimize Si-coumpounds with latest version, and just change optimize->optimize_global, then forces are different situation. Would it be possible to update your current script for optimization? Results

qzhu2017 commented 6 months ago

@shinnosukehattori Try the updated script.

qzhu2017 commented 6 months ago

@shinnosukehattori , the optimized ff can provide better lattice parameters too.

(ost) qzhu@cms-2:~/GitHub/OST/examples/Si-compounds$ python 0_test_interface.py 

------Crystal from Seed------
Dimension: 3
Composition: [C[Si]1(O)O[Si](C)(O)O[Si](C)(O)O[Si](C)(O)O1]4
Group: P 1 21/n 1 (14)
 12.0180,   8.4900,  14.3420,  90.0000,  68.7170,  90.0000, monoclinic
Wyckoff sites:
    Si4H16C4O8   @ [ 0.2743  0.5548  0.4539]  WP [4e] Site [1] Euler [   0.0    0.0    0.0]

From parameters_init.xml
Lattice before relaxation  12.0180,   8.4900,  14.3420,  90.0000,  68.7170,  90.0000, monoclinic
Lattice after  relaxation  12.5739,   8.0951,  13.2683,  90.0000,  65.8808,  90.0000, monoclinic

Energy -1152.01636

 From parameters_opt.xml

Lattice before relaxation  12.0180,   8.4900,  14.3420,  90.0000,  68.7170,  90.0000, monoclinic
Lattice after  relaxation  13.1236,   8.4040,  13.1529,  90.0000,  69.0542,  90.0000, monoclinic

Energy -1092.24095

qzhu2017 commented 6 months ago

Using aspirin as an example, we can see that the initial FF may have better performance on the low energy regions, although it has the higher error values.

shinnosukehattori commented 6 months ago

it must be a transferability problem on FF. As it is based on the GAFF, which is a fixed charge model, it may be difficult to prepare a general force field that includes cases where the charge state changes depending on the crystal structure.

At the moment, no good strategy has emerged, but the most straightforward idea is to group several and prepare several FFs.

For example, could we try to prepare two different forcefields, one optimised for the whole group and one optimised for the top group? If the accuracy of the force field optimised only for the top group can be maintained or increased, then that should be used as the CSP. On the other hand, it would be better to use the force field optimised for the whole group for the first candidate selection.

qzhu2017 commented 6 months ago

@shinnosukehattori That's a good idea. Let me give a try.

qzhu2017 commented 6 months ago

@shinnosukehattori Here are the results for Silicon. The optimization gives much better results than bad initial guess.

qzhu2017 commented 6 months ago

If we only consider the low energy samples from CSP, there is quite a big room to improve the accuracy.

shinnosukehattori commented 6 months ago

Thanks for the SiO2 update. I would like to use the parameters here to predict the crystal structure. The improved accuracy on low energy samples is also good information. I feel that the reproducibility of the cell shape has improved, as the stresses have improved considerably. It would also be useful to consider an extended GAFF with improved transferability in organic crystals by making small updates to the model itself, based on the difference between the low energy optimised and the overall combined force field parameters.

qzhu2017 commented 6 months ago

Sometimes, it is possible that a very unreasonable structure was found, and this structure makes the ff optimization be out of control. Need to check if the geometry satisfy the minimum requirement before adding this structure.

shinnosukehattori commented 6 months ago

Thank you. Could you also upload the bad structure, for example the forces over +- 50 ? I will consider to detour this type error by ff equation change. More simply, could there be a patchy solution, for example, ignoring large forces at a threshold?

qzhu2017 commented 6 months ago

I think most of times it is simply due to the formation of unfavorable new bonding (e.g., O-O bond). We can just ignore it.

shinnosukehattori commented 6 months ago

Thank you for your prompt response; O-O bonds are easy to understand because they can be easily eliminated.

shinnosukehattori commented 6 months ago

Currently, I am not able to complete the ffopt function due to the following error. If you know of a solution, please let me know.

File "/groups/3/gcd50793/shattori/app/HT-OCSP/examples/WAHJEW/1-on-the-fly-csp.py", line 52, in match = ga.run(pmg0) File "/groups/3/gcd50793/shattori/app/HT-OCSP/htocsp/GA.py", line 371, in run N_added = self.ff_optimization(xtals, N_added) File "/groups/3/gcd50793/shattori/app/HT-OCSP/htocsp/GA.py", line 567, in ff_optimization errs = self.parameters.plot_ff_results(performance_fig, File "/groups/3/gcd50793/shattori/app/OST/ost/parameters.py", line 1770, in plot_ffresults , err_dic = self._plot_ff_results(axes, params[0], ref_dics, labels, max_E, max_dE) File "/groups/3/gcd50793/shattori/app/OST/ost/parameters.py", line 1800, in _plot_ff_results results = self.evaluate_multi_references(ref_dics, parameters, max_E, max_dE) File "/groups/3/gcd50793/shattori/app/OST/ost/parameters.py", line 1535, in evaluate_multi_references res = result.result() File "/home/acd13730wr/micromamba/envs/ost/lib/python3.9/concurrent/futures/_base.py", line 446, in result return self.get_result() File "/home/acd13730wr/micromamba/envs/ost/lib/python3.9/concurrent/futures/_base.py", line 391, in get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

=============== Errors of csp structure in gen 2 1.5280 0.3067 0.0132 -47.7174 -188.4570 False True False Errors of csp structure in gen 2 1.1603 0.3778 0.0107 -47.2798 -188.3871 False True True Errors of csp structure in gen 2 1.6640 0.3437 0.0127 -47.6808 -188.2844 False True False Errors of csp structure in gen 2 1.5937 0.3765 0.0119 -47.6574 -188.3312 False True True Errors of csp structure in gen 2 1.6094 0.3544 0.0114 -47.4630 -188.1213 False True True Errors of csp structure in gen 2 1.7836 0.3491 0.0126 -47.6336 -188.1176 False True False Errors of csp structure in gen 2 1.5155 0.3835 0.0115 -47.5084 -188.2605 False True True FF performances R2 -45.4168 -2.9195 -92.5023 RMSE 1.6101 0.3650 0.0121 Add 48 references in 3.58 min Do not update ff, the current number of configurations is 48

shinnosukehattori commented 5 months ago

python 1-on-the-fly-csp.py -g 5 -p 80 -n 8 --ffopt > log1-ff1-opt

MaterSim / pyocse

FF-optimization #8