ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
412 stars 155 forks source link

stress errors reported during fitting and at end should be consistent #442

Open bernstei opened 4 weeks ago

bernstei commented 4 weeks ago

The error report during fitting reports RMSE_stress_per_atom, and it clearly differs in value from the RMSE Stress (Virials) / meV / A (A^3) column of the final report table (see below). I think we should just stop ever reporting (by default, at least) single atom stress contributions, because they seem to be normalized very weirdly (i.e. divided by total volume, so they're sub-intensive - only the sum over all atoms is intensive). Also, calling them "per atom" is confusing to me, as I assume that means "divided by n_atoms", rather than from each atom.

2024-05-31 17:58:38.255 INFO: head: pbe_mp, Epoch 475: loss=0.0024, RMSE_E_per_atom=53.2 meV, RMSE_F=230.2 meV / A, RMSE_stress_per_atom=2.8 meV / A^3
2024-05-31 17:58:39.544 INFO: head: Default, Epoch 475: loss=0.0027, RMSE_E_per_atom=2.3 meV, RMSE_F=99.3 meV / A, RMSE_stress_per_atom=0.5 meV / A^3

vs.

2024-05-31 19:06:24.127 INFO: Loaded model from epoch 475
2024-05-31 19:06:24.127 INFO: Evaluating train ...
2024-05-31 19:06:34.452 INFO: Evaluating Default ...
2024-05-31 19:06:35.808 INFO: Evaluating pbe_mp ...
2024-05-31 19:06:36.454 INFO: 
+-------------+---------------------+------------------+-------------------+---------------------------------------+
| config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % | RMSE Stress (Virials) / meV / A (A^3) |
+-------------+---------------------+------------------+-------------------+---------------------------------------+
|    train    |         4.8         |       51.0       |        4.63       |                  6.3                  |
|   Default   |         2.3         |       99.3       |        9.66       |                  5.1                  |
|    pbe_mp   |         53.2        |      230.2       |       16.09       |                  14.5                 |
+-------------+---------------------+------------------+-------------------+---------------------------------------+
gabor1 commented 4 weeks ago

Is it just the name? That this is really virial/atom?

bernstei commented 4 weeks ago

Is it just the name? That this is really virial/atom?

If so then the printed units are wrong, so that needs to be fixed instead (using consistent units)

bernstei commented 4 weeks ago

Pretty sure it's https://github.com/ACEsuit/mace/blob/dee204f1f9d587f28fd792fdad1f45039ef71e94/mace/tools/train.py#L60 and it appears that it really is per atom (based on what update() in the same file does in https://github.com/ACEsuit/mace/blob/dee204f1f9d587f28fd792fdad1f45039ef71e94/mace/tools/train.py#L423-L425 ), not just the contribution from each atom (i.e. the site stress contribution).

But I think we'll need @ilyes319 to confirm. And I'd strongly urge never to store an actual total stress / n_atoms. That's just not meaningful.

bernstei commented 4 weeks ago

See #443