ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
558 stars 207 forks source link

Suspicious energies during mace md #368

Closed 1234zou closed 8 months ago

1234zou commented 8 months ago

Dear MACE developers and users, I've trained a rough model for liquid water (using periodic box). If this model is applied to evaluate the energies of geometries in the training set, the obtained energies are good as expected (which is around -133048 eV). But when I use this model to run a short md trajectory, the output is strange:

Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
/public/home/jxzou/software/anaconda3/envs/mace-openmm/lib/python3.10/site-packages/torch/jit/_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
  warnings.warn("The TorchScript type system doesn't support "
Running MACEForce on device:  cpu  with dtype:  torch.float64 and neigbbour list:  torch
#"Progress (%)","Step","Time (ps)","Potential Energy (kJ/mole)","Total Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)","Density (g/mL)","Speed (ns/day)","Time Remaining"
5.0%,25,0.012500000000000008,-65172.872473893054,-65023.48632016675,62.712127467495606,1.915864488,0.9993241003310723,0,--
10.0%,50,0.02500000000000002,-65261.13773803623,-65031.31871053035,96.47775103581478,1.915864488,0.9993241003310723,0.145,2:14
15.0%,75,0.037500000000000026,-65273.278554504475,-65026.40457957448,103.63739742963143,1.9242895925475174,0.9949487661533322,0.143,2:08
20.0%,100,0.05000000000000004,-65343.2507969423,-65022.86766450165,134.49645324468443,1.9242895925475174,0.9949487661533322,0.144,2:00
25.0%,125,0.06250000000000004,-65402.69120891198,-65025.42918192384,158.3741446912931,1.9242895925475174,0.9949487661533322,0.144,1:52
30.0%,150,0.07500000000000005,-65418.792778944735,-65021.289068720995,166.87157894181823,1.9249452806830287,0.9946098598436542,0.145,1:44
35.0%,175,0.08750000000000006,-65429.324490552586,-65021.570219670575,171.17475196408034,1.9249452806830287,0.9946098598436542,0.145,1:36
40.0%,200,0.10000000000000007,-65470.62027844583,-65026.77493840418,186.32573934222958,1.9190578106264542,0.9976612195970593,0.145,1:29
45.0%,225,0.11250000000000009,-65629.83486344227,-65040.044527493774,247.59327290040682,1.9008672694268178,1.0072084393373582,0.145,1:21
50.0%,250,0.12500000000000008,-65752.75495279465,-65031.58910195846,302.74455586897113,1.9008672694268178,1.0072084393373582,0.142,1:15
55.0%,275,0.1375000000000001,-65805.86152165181,-65029.318300411986,325.9919092878816,1.908419663523503,1.0032225052072632,0.142,1:08
60.0%,300,0.1500000000000001,-66103.80504595942,-65040.80227750373,446.247282300537,1.8997066302035446,1.0078238004684512,0.142,1:00
65.0%,325,0.16250000000000012,-66355.91898255645,-65033.359654917906,555.2088133291892,1.9052506127252242,1.0048911901874669,0.143,0:52
70.0%,350,0.17500000000000013,-66383.34899326786,-65037.35860102288,565.0451460390439,1.9052506127252242,1.0048911901874669,0.143,0:45
75.0%,375,0.18750000000000014,-66566.05258264432,-65056.64104229468,633.6491472478866,1.8907080198025286,1.012620423552662,0.144,0:37
80.0%,400,0.20000000000000015,-66840.74839190248,-65025.141352128696,762.189649234713,1.8874248962169082,1.0143818488695102,0.144,0:30
85.0%,425,0.21250000000000016,-66905.47954660666,-65007.51244401967,796.7643044389393,1.8713953482814112,1.0230705968063287,0.144,0:22
90.0%,450,0.22500000000000017,-67154.36682329794,-65073.03979146202,873.738581965072,1.8713953482814112,1.0230705968063287,0.144,0:15
95.0%,475,0.23750000000000018,-66919.25191185695,-65072.40516659744,775.3040399837838,1.8713953482814112,1.0230705968063287,0.144,0:07
100.0%,500,0.25000000000000017,-67492.94925864285,-65091.65705567727,1008.0595755546456,1.8584611525353136,1.0301907861861919,0.144,0:00

The energies shown in md are around -65023 kJ/mol, i.e. -673.9 eV, which is far from the energy of a water box (around -133048 eV). I do not know where is wrong during the md simulation. The python script used to run md is

from openmmtools.openmm_torch.hybrid_md import PureSystem
import mdtraj as md
from os import remove, rename
from shutil import rmtree
import torch
torch.set_default_dtype(torch.float64)
xyzname='h2o_md.xyz'

system=PureSystem(file=xyzname,ml_mol=xyzname,
  model_path='h2o_md_1.model',
  output_dir='./',
  potential='mace',
  temperature=273.15,
  pressure=1.0,
  nl='torch',
  max_n_pairs=-1,
  timestep=0.5,
  minimiser='openmm',
  remove_cmm=True, unwrap=True)

system.run_mixed_md(steps=500,interval=25,  integrator_name='nose-hoover',
  output_file='h2o_md_1.pdb',
  restart=False)

Please see the attached files h2o_md.xyz and h2o_md_1.model h2o_md.zip

By the way, all the computations are performed on CPU since we have very limited GPU resources. Thanks to anyone for any kind help!

ilyes319 commented 8 months ago

Is the simulation stable? It is possible that the logger is only printing the interaction energy and not the total energy. OpenMM is bio-md code, so it should be working with interaction energies. @jharrymoore Do you have an idea?

jharrymoore commented 8 months ago

Hi, thanks for the issue - @ilyes319 is correct - openMM expects just the interaction energies, so the difference you see is exactly the sum of E0s for the system. Also from a quick look at the temperature it appears something has exploded - could you check whether your minimised structure is visually sensible? I suspect it is, in which case you will need to look at how the potential was fitted. Could you share the log file from the training run?

gabor1 commented 8 months ago

Make sure that when you do the training, the energies are in eV and distances in Angstrom, and forces are in eV/Angstrom. MACE training only works in those units. Your output shows energies in kJ/mol....

1234zou commented 8 months ago

Thank you very much! @ilyes319 and @jharrymoore . The md simulation is not stable (I mean that the temperature is increasing from 62K~1000K during simulation, and there are some unreasonable geometries which have O-O bonds at the end of the simulation). The reason is that the model is very rough currently. I want to utilize the model to run short MD, discard unreasonable structures, perform DFT calculations upon chosen structures (based on model deviation), and finally improve the model. The current explosion is as expected.

Now the problem about energies is solved. The OpenMM prints interaction energies.

1234zou commented 8 months ago

I'm sorry that I also have a very related problem: there are mirror molecules in the trajectory. After the md simulation is done, I use the following Python script to convert the .dcd and .pdb files together into a .xyz file:

import mdtraj as md

t = md.load('output.dcd', top='minimised_system.pdb')
t.unitcell_vectors *= 10.0 # nm to A
t.xyz *= 10.0 # nm to A
f = open('output.xyz', 'w+')

for i in range(t.n_frames):
  f.write(str(t.n_atoms))
  f.write('\nLattice=\"')
  f.write(' '.join(map(str,t.unitcell_vectors[i,:,:].flatten())))
  f.write('\" Properties=species:S:1:pos:R:3 pbc="T T T"\n')
  for j in range(t.n_atoms):
    f.write(t.topology.atom(j).element.symbol+'  ')
    f.write('  '.join(format(x,'.8f') for x in t.xyz[i,j,:]))
    f.write('\n')

f.close()

I find that for some snapshots, some water molecules run out of the box (which may be normal), but this water molecule is exactly at the position of a water molecule inside of the box (they just differ by one lacttice vector length). If I use this snapshot to perform PBC-DFT calculation, it will complain about there are two atoms whose distance is zero. Is there something about wrap/unwrap missing in my python script?

Thanks again for your kind help.

1234zou commented 8 months ago

@gabor1 , thanks for your kind remind. I do use the energy (in eV) and forces (in eV/A) during training. Here are the first few lines of my training set

1
Lattice="50.0 0.0 0.0 0.0 50.0 0.0 0.0 0.0 50.0" Properties=species:S:1:pos:R:3:forces:R:3 config_type=IsolatedAtom energy=-13.595241664 pbc="F F F"
H  0.0 0.0 0.0  0.0 0.0 0.0
1
Lattice="50.0 0.0 0.0 0.0 50.0 0.0 0.0 0.0 50.0" Properties=species:S:1:pos:R:3:forces:R:3 config_type=IsolatedAtom energy=-2041.115929518 pbc="F F F"
O  0.0 0.0 0.0  0.0 0.0 0.0
192
Lattice="12.420 .000 .000 .000 12.420 .000 .000 .000 12.420" Properties=species:S:1:pos:R:3:forces:R:3 energy=-133048.85539372 stress="-.0026950 .0026167 .0018380 .0026167 -.0082802 .0030255 .0018380 .0030255 -.0071386" pbc="T T T"
O         0.58046800         1.63077000         4.91045700         1.77785023         1.87039867         1.79609273
...

and here are the final few lines of training output

2024-04-05 00:24:42.487 INFO: Evaluating train ...
2024-04-05 00:24:53.382 INFO: Evaluating valid ...
2024-04-05 00:24:53.884 INFO:
+-------------+---------------------+------------------+-------------------+---------------------------------------+
| config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % | RMSE Stress (Virials) / meV / A (A^3) |
+-------------+---------------------+------------------+-------------------+---------------------------------------+
|    train    |         1.0         |       69.3       |        5.45       |                  10.7                 |
|    valid    |         1.2         |       63.0       |        5.28       |                  9.0                  |
+-------------+---------------------+------------------+-------------------+---------------------------------------+
2024-04-05 00:24:53.884 INFO: Saving model to checkpoints/h2o_1_run-1000_swa.model
2024-04-05 00:24:53.959 INFO: Done

The output which includes Total Energy (kJ/mole) is printed by OpenMM and they are interaction energies, not printed by MACE. So I guess it is O.K. Anyway, I want to thank you for your kind remind since I did make this mistake last week, because I usually deal with energies in Hartree.

jharrymoore commented 8 months ago

Make sure that when you do the training, the energies are in eV and distances in Angstrom, and forces are in eV/Angstrom. MACE training only works in those units. Your output shows energies in kJ/mol....

Just to clarify this - openMM uses kJ/mol and nm as internal units, so the writeout here is expected. MACE must be fitted as Gabor outlines above and the conversion is handled internally in the openmm interface

jharrymoore commented 8 months ago

I'm sorry that I also have a very related problem: there are mirror molecules in the trajectory. After the md simulation is done, I use the following Python script to convert the .dcd and .pdb files together into a .xyz file:


import mdtraj as md

t = md.load('output.dcd', top='minimised_system.pdb')

t.unitcell_vectors *= 10.0 # nm to A

t.xyz *= 10.0 # nm to A

f = open('output.xyz', 'w+')

for i in range(t.n_frames):

  f.write(str(t.n_atoms))

  f.write('\nLattice=\"')

  f.write(' '.join(map(str,t.unitcell_vectors[i,:,:].flatten())))

  f.write('\" Properties=species:S:1:pos:R:3 pbc="T T T"\n')

  for j in range(t.n_atoms):

    f.write(t.topology.atom(j).element.symbol+'  ')

    f.write('  '.join(format(x,'.8f') for x in t.xyz[i,j,:]))

    f.write('\n')

f.close()

I find that for some snapshots, some water molecules run out of the box (which may be normal), but this water molecule is exactly at the position of a water molecule inside of the box (they just differ by one lacttice vector length). If I use this snapshot to perform PBC-DFT calculation, it will complain about there are two atoms whose distance is zero. Is there something about wrap/unwrap missing in my python script?

Thanks again for your kind help.

I see you set unwrap=True in your initial script- you need to wrap molecules back into the periodic box before you try to compute anything else. I recommend using the following as a starting point https://docs.mdanalysis.org/1.0.0/documentation_pages/transformations/wrap.html

1234zou commented 8 months ago

could you check whether your minimised structure is visually sensible? I suspect it is, in which case you will need to look at how the potential was fitted. Could you share the log file from the training run?

Yes, files are attached minimised_system_and_log.zip

I see you set unwrap=True in your initial script- you need to wrap molecules back into the periodic box before you try to compute anything else. I recommend using the following as a starting point https://docs.mdanalysis.org/1.0.0/documentation_pages/transformations/wrap.html

Just to make it clear: it is O.K. for me to set unwrap=True, but I need to wrap molecules back into the periodic box, right?

What makes me confused is that even if I set unwrap=False, there are still mirror water molecules in the converted .xyz file.

jharrymoore commented 8 months ago

Yes, either setting is fine, but will dictate how the trajectory is processed after the fact - if you use unwrap=True, you will need to wrap molecules back into the box. I am not sure exactly how you are getting two copies of a molecule in your trajectory. I would strongly recommend using MDAnalysis or similar to make sure your trajectory is imaged properly before you proceed.