Poor energy & force metrics on paper's datasets (carbon nanotube, buckyball catcher)

ale99WGiais commented 5 months ago

Describe the bug We tried to fit MACE potentials on some datasets mentioned in the reference paper "Evaluation of the MACE Force Field Architecture: from Medicinal Chemistry to Materials Science".

In particular we tried fitting MACE on "Double-walled nanotube" and "Buckyball catcher".

The MAE metrics obtained by us are very different from the ones stated in the paper, so we are wondering what we colud be doing wrong :(

To Reproduce

MACE was installed using the following commands

git clone https://github.com/ACEsuit/mace.git

conda create -n mace python=3.10 -y
conda activate mace
conda install micromamba -c conda-forge -c anaconda
micromamba install pytorch==2.0 torchvision torchaudio pytorch-cuda -c pytorch -c nvidia -c conda-forge -c anaconda
micromamba install numpy scipy matplotlib ase opt_einsum prettytable pandas e3nn scikit-learn=1.3.2 -c conda-forge -c anaconda
pip install mace/

To fit MACE on the nanotube we used the following scripts:

python ~/mace/mace/cli/run_train.py \
    --name="tube-256-0-r6-int1" \
    --train_file="../md22_double-walled_nanotube.xyz" \
    --valid_fraction=0.05 \
    --E0s="average" \
    --model="MACE" \
    --num_interactions=1 \
    --num_channels=256 \
    --max_L=0 \
    --correlation=3 \
    --r_max=6.0 \
    --forces_weight=1000 \
    --energy_weight=10 \
    --batch_size=2 \
    --valid_batch_size=2 \
    --max_num_epochs=650 \
    --start_swa=450 \
    --scheduler_patience=5 \
    --patience=15 \
    --eval_interval=3 \
    --ema \
    --swa \
    --swa_forces_weight=10 \
    --error_table='PerAtomMAE' \
    --default_dtype="float64"\
    --device=cuda \
    --seed=123 \
    --restart_latest \
    --save_cpu

python ~/mace/mace/cli/run_train.py \
    --name="tube-256-2-r5-int2" \
    --train_file="../md22_double-walled_nanotube.xyz" \
    --valid_fraction=0.05 \
    --E0s="average" \
    --model="MACE" \
    --num_interactions=2 \
    --num_channels=256 \
    --max_L=2 \
    --correlation=3 \
    --r_max=5.0 \
    --forces_weight=1000 \
    --energy_weight=10 \
    --batch_size=1 \
    --valid_batch_size=2 \
    --max_num_epochs=650 \
    --start_swa=450 \
    --scheduler_patience=5 \
    --patience=15 \
    --eval_interval=3 \
    --ema \
    --swa \
    --swa_forces_weight=10 \
    --error_table='PerAtomMAE' \
    --default_dtype="float64"\
    --device=cuda \
    --seed=123 \
    --restart_latest \
    --save_cpu

python ~/mace/mace/cli/run_train.py \
    --name="tube-256-2-r3-int2" \
    --train_file="../md22_double-walled_nanotube.xyz" \
    --valid_fraction=0.05 \
    --E0s="average" \
    --model="MACE" \
    --num_interactions=2 \
    --num_channels=256 \
    --max_L=2 \
    --correlation=3 \
    --r_max=3.0 \
    --forces_weight=1000 \
    --energy_weight=10 \
    --batch_size=2 \
    --valid_batch_size=2 \
    --max_num_epochs=650 \
    --start_swa=450 \
    --scheduler_patience=5 \
    --patience=15 \
    --eval_interval=3 \
    --ema \
    --swa \
    --swa_forces_weight=10 \
    --error_table='PerAtomMAE' \
    --default_dtype="float64"\
    --device=cuda \
    --seed=123 \
    --restart_latest \
    --save_cpu

According with the examples in https://mace-docs.readthedocs.io/en/latest/examples/training_examples.html

Similar scripts were adopted for the buckyball catcher.

The code was submitted to single Nvida Tesla A100 GPU machines with a time limit of about 3 days.

Data for both nanotube and buckyball was downloaded from here: http://www.sgdml.org/

Expected behavior

We expected to have low energy and force MAE as in the paper:

But we got errors orders of magnitude higher:

buckyball mace-256-0-r6-int1 stdout.txt buckyball mace-256-2-r3-int2 stdout.txt buckyball mace-256-2-r5-int2 stdout.txt nanotube mace-256-0-r6-int1 stdout.txt nanotube mace-256-2-r3-int2 stdout.txt nanotube mace-256-2-r5-int2 stdout.txt

Everything is uploaded here: https://uniudamce-my.sharepoint.com/:f:/g/personal/142135_spes_uniud_it/EvqCwMiR9PNMkZqb8L5iQTMBnHnpEm0-CQVCOsEskxbdaA?e=dhfnXJ

Thanks very much for the support, Alessio

ilyes319 commented 5 months ago

The MACE numbers are in eV and eV/A, but the original dataset is in kcal/mol. Did you make the conversion? For numerical precision, it is better to use eV and eV/A in the MACE code.

ale99WGiais commented 5 months ago

Hi Ilyes, thanks for your very quick response!

No, I'm sorry but we didn't notice that the original dataset was in in kcal/mol.

We'll try converting the dataset to eV and refit the potentials asap :)

ACEsuit / mace

Poor energy & force metrics on paper's datasets (carbon nanotube, buckyball catcher) #303