ACEsuit / mace

MACE - Fast and accurate machine learning interatomic potentials with higher order equivariant message passing.
Other
413 stars 157 forks source link

training with foudational_model #412

Closed mofeilu closed 1 month ago

mofeilu commented 1 month ago

I try to finetune a MACE foundation model for my dataset, using the following script modified from (https://github.com/ACEsuit/mace?tab=readme-ov-file#pretrained-foundation-models)

I am training with force only:

python run_train.py \
  --name="MACE_test" \
  --foundation_model="2023-12-10-mace-128-L0_energy_epoch-249.model" \
  --train_file="train_sige.xyz" \
  --valid_fraction=0.05 \
  --test_file="test_sige.xyz" \
  --loss="forces_only" \
  --energy_weight=1.0 \
  --forces_weight=1.0 \
  --E0s="average" \
  --lr=0.01 \
  --scaling="rms_forces_scaling" \
  --batch_size=2 \
  --max_num_epochs=10 \
  --ema \
  --ema_decay=0.99 \
  --amsgrad \
  --default_dtype="float64" \
  --device=cuda \
  --seed=3

got the following output log: 2024-05-08 11:47:39.880 INFO: Using gradient clipping with tolerance=10.000 2024-05-08 11:47:39.880 INFO: Started training 2024-05-08 11:47:40.728 INFO: Epoch None: loss=0.0000, RMSE_E_per_atom=1192.3 meV, RMSE_F=784.6 meV / A 2024-05-08 11:48:08.007 INFO: Epoch 0: loss=0.0000, RMSE_E_per_atom=4652.4 meV, RMSE_F=31.6 meV / A 2024-05-08 11:48:24.208 INFO: Epoch 2: loss=0.0000, RMSE_E_per_atom=4794.9 meV, RMSE_F=0.0 meV / A 2024-05-08 11:48:40.444 INFO: Epoch 4: loss=0.0000, RMSE_E_per_atom=4795.0 meV, RMSE_F=0.0 meV / A 2024-05-08 11:48:56.325 INFO: Epoch 6: loss=0.0000, RMSE_E_per_atom=4795.0 meV, RMSE_F=0.0 meV / A 2024-05-08 11:49:12.283 INFO: Epoch 8: loss=0.0000, RMSE_E_per_atom=4795.0 meV, RMSE_F=0.0 meV / A 2024-05-08 11:49:20.177 INFO: Training complete 2024-05-08 11:49:20.177 INFO: Computing metrics for training, validation, and test sets 2024-05-08 11:49:20.225 INFO: Loading checkpoint: checkpoints/MACE_test_run-3_epoch-0.pt 2024-05-08 11:49:20.274 INFO: Loaded model from epoch 0 2024-05-08 11:49:20.274 INFO: Evaluating train ... 2024-05-08 11:49:24.572 INFO: Evaluating valid ... 2024-05-08 11:49:24.653 INFO: Evaluating Default ... 2024-05-08 11:49:24.890 INFO: +-------------+---------------------+------------------+-------------------+ | config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % | +-------------+---------------------+------------------+-------------------+ | train | 4803.0 | 32.3 | 3229894489.99 | | valid | 4652.4 | 31.6 | 3157367728.77 | | Default | 4832.3 | 30.5 | 3048672532.24 | +-------------+---------------------+------------------+-------------------+ 2024-05-08 11:49:24.890 INFO: Saving model to checkpoints/MACE_test_run-3.model 2024-05-08 11:49:25.056 INFO: Compiling model, saving metadata to MACE_test_compiled.model 2024-05-08 11:49:26.039 INFO: Done

so why the loss is always zero? and RMSE_F decrease to 0 after Epoch 2?

the MACE model version is latest: 0.3.5, my OS is Redhat7

ilyes319 commented 1 month ago

You must have wrong keys in your atoms.arrays for forces. You should rename it to something else than forces.

gabor1 commented 1 month ago

@ilyes319 we should mandate the options that specify keys!!

tkreiman commented 1 month ago

I think that when the key is named forces by default, the forces get sent to atoms.info in this line but this line expects them to be in atoms.

bernstei commented 1 month ago

forces are supposed to be stored in atoms.arrays, not info, and there's a recent PR that fixes this, but maybe it hasn't been merged into every branch.

[added] #409, but that's already on top of another one that uses different names for the keys that are less likely to have been chosen by the user for the real data

ilyes319 commented 1 month ago

that's fixed now in the main branch. There is a big warning asking to change the key aswell.

mofeilu commented 1 month ago

I got the warning msg like this:

2024-05-13 10:51:35.710 INFO: Using energy_key 'energy' is unsafe, consider using a different key, rewriting energies to 'REF_energy' 2024-05-13 10:51:35.720 INFO: Using forces_key 'forces' is unsafe, consider using a different key, rewriting forces to 'REF_forces'

what dose it mean "consider using a different key"?
do I need to change the key in the training xyz file?

part of my xyz file looks like this:

48 Lattice="7.840399989796 0.0 0.0 0.0 11.088 0.0 0.0 0.0 27.441399964288" Properties=species:S:1:pos:R:3:forces:R:3 energy=-6014.779103447812 pbc="T T T" Ge 0.06298572 1.25570833 9.43870012 -0.37590346 -0.10965079 -0.24666951 Ge 7.85319797 1.44334739 13.72582935 0.09424463 -0.26651217 -0.23405920 Ge 0.07512341 1.49726992 17.60315102 -0.65119998 -0.13948811 0.77522162 Ge -0.07906621 6.81537583 9.42841749 0.75863454 -0.43477883 0.07800763

the thing is, this is xyz file is converted automatically from ase atoms, in ase atoms, energy and forces key are used so they remained in the converted xyz file.

I also checked the xyz file suggested by the mace tutorial for MD22 test: md22_double-walled_nanotube.zip, it also shows the key word of "energy" and "forces", so looks like for xyz file it is common to use those keyword, then why mace keep asking for different keywords?

ilyes319 commented 1 month ago

There is a recent change to ASE that breaks a lot of compatibility with existing datasets. It is no longer recommanded in ASE to use "forces" and "energy" to store results in dataset, and previous datasets are doing that because it used to not be the case. We encourage renaming your keys from "forces" to "REF_forces" and "energies" to "REF_energies" of any dataset that you are using.

mofeilu commented 1 month ago

thanks for the clarification! so for now both keywords "forces" and "REF_forces" works in the latest main branch version?

gabor1 commented 1 month ago

@ilyes319 let's make the warning message more verbose. something like what you wrote above. "Since ASE version XXX, using "energy" and "forces" is no longer safe when communicating between ASE and MACE. We recommend you use different keys in your XYZ files.

gabor1 commented 1 month ago

And also add that "you need to use the options --xxxx and --yyyy to tell MACE what key names you have chosen"

mofeilu commented 1 month ago

yes, I saw that if I change the keywords in the .xyz file, I also need to change the option --forces_key in the run script. It can be confusing to users

ilyes319 commented 1 month ago

@ilyes319 let's make the warning message more verbose. something like what you wrote above. "Since ASE version XXX, using "energy" and "forces" is no longer safe when communicating between ASE and MACE. We recommend you use different keys in your XYZ files.

@gabor1 Ok I have done that, merging to main.