Closed MengnanCui closed 2 weeks ago
Hello, The multihead finetuning is not yet supported for other models than MP pretrained models. Hopefully I can fix that soon. For now please use the normal finetuning.
Ok, thanks for your reply, looking foward to know updates.
@MengnanCui Can you test again with the latest main? I should have fixed that.
Great, Thank you! I will try it!
Hi, @ilyes319 thank so much for your efforts.
(1) I tried the latest main branch, with the same setting as all above, it still outputs this error while finetuning.
2024-10-03 08:32:13.385 INFO: ===========VERIFYING SETTINGS===========
2024-10-03 08:32:13.386 INFO: MACE version: 0.3.7
2024-10-03 08:32:13.453 INFO: CUDA version: 11.8, CUDA device: 0
2024-10-03 08:32:14.229 INFO: Using foundation model ../ptbp_model.model as initial checkpoint.
2024-10-03 08:32:14.230 INFO: ===========LOADING INPUT DATA===========
2024-10-03 08:32:14.230 INFO: Using heads: ['default']
2024-10-03 08:32:14.231 INFO: ============= Processing head default ===========
2024-10-03 08:32:14.300 INFO: Training set [100 configs, 100 energy, 4761 forces] loaded from 'training.xyz'
2024-10-03 08:32:14.628 INFO: Validation set [1000 configs, 1000 energy, 46593 forces] loaded from '../../fixed_validation.xyz'
2024-10-03 08:32:14.946 INFO: Test set (1000 configs) loaded from '../../fixed_test.xyz':
2024-10-03 08:32:14.947 INFO: Default_Default: 1000 configs, 1000 energy, 46560 forces
2024-10-03 08:32:14.947 INFO: Total number of configurations: train=100, valid=1000, tests=[Default_Default: 1000],
2024-10-03 08:32:14.948 INFO: ==================Using multiheads finetuning mode==================
2024-10-03 08:32:14.948 INFO: Using foundation model for multiheads finetuning with ../../../transferability7k/training.xyz
2024-10-03 08:32:17.246 INFO: Training set [7642 configs, 7642 energy, 380589 forces] loaded from '../../../transferability7k/training.xyz'
2024-10-03 08:32:17.776 INFO: Validation set [1000 configs, 1000 energy, 46593 forces] loaded from '../../../transferability7k/validation.xyz'
2024-10-03 08:32:17.776 INFO: Total number of configurations: train=7642, valid=1000
2024-10-03 08:32:17.817 INFO: Atomic Numbers used: [74]
2024-10-03 08:32:17.817 INFO: Isolated Atomic Energies (E0s) not in training file, using command line argument
2024-10-03 08:32:17.823 INFO: Atomic Energies used (z: eV) for head default: {74: -11.022250868182281}
2024-10-03 08:32:17.823 INFO: Atomic Energies used (z: eV) for head pt_head: {74: -29.330717613489064}
2024-10-03 08:32:26.050 INFO: Average number of neighbors: 57.10205972318726
2024-10-03 08:32:26.051 INFO: During training the following quantities will be reported: energy, forces, virials, stress
2024-10-03 08:32:26.051 INFO: ===========MODEL DETAILS===========
Traceback (most recent call last):
File "/home/mncui/software/miniconda3/envs/mace_foundation/bin/mace_run_train", line 8, in <module>
sys.exit(main())
File "/work/home/mncui/software/mace_main10_2024/mace/cli/run_train.py", line 63, in main
run(args)
File "/work/home/mncui/software/mace_main10_2024/mace/cli/run_train.py", line 505, in run
model, output_args = configure_model(args, train_loader, atomic_energies, model_foundation, heads, z_table)
File "/work/home/mncui/software/mace_main10_2024/mace/tools/model_script_utils.py", line 37, in configure_model
args.mean, args.std = modules.scaling_classes[args.scaling](
File "/work/home/mncui/software/mace_main10_2024/mace/modules/utils.py", line 312, in compute_mean_rms_energy_forces
node_e0 = atomic_energies_fn(batch.node_attrs)
File "/home/mncui/software/miniconda3/envs/mace_foundation/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/work/home/mncui/software/mace_main10_2024/mace/modules/blocks.py", line 160, in forward
return torch.matmul(x, torch.atleast_2d(self.atomic_energies).T)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (421x1 and 2x1)
(2) On the other hand & for your information, to exclude the effects of mace version.(the ../ptbp_model.model
i used above was based on a code at least 3/4 month ago.) Therefore, I did a new training with the latest main branch, got ./train_main/MACE_model.model
, then multihead finetunning based on it, there is a different error message as the following:
2024-10-03 10:05:17.508 INFO: ===========VERIFYING SETTINGS===========
2024-10-03 10:05:17.508 INFO: MACE version: 0.3.7
2024-10-03 10:05:17.570 INFO: CUDA version: 11.8, CUDA device: 0
2024-10-03 10:05:18.256 INFO: Using foundation model ./train_main/MACE_model.model as initial checkpoint.
2024-10-03 10:05:18.257 INFO: ===========LOADING INPUT DATA===========
2024-10-03 10:05:18.257 INFO: Using heads: ['default']
2024-10-03 10:05:18.257 INFO: ============= Processing head default ===========
2024-10-03 10:05:18.320 INFO: Training set [100 configs, 100 energy, 4761 forces] loaded from 'training.xyz'
2024-10-03 10:05:18.624 INFO: Validation set [1000 configs, 1000 energy, 46593 forces] loaded from '../../fixed_validation.xyz'
2024-10-03 10:05:18.925 INFO: Test set (1000 configs) loaded from '../../fixed_test.xyz':
2024-10-03 10:05:18.926 INFO: Default_Default: 1000 configs, 1000 energy, 46560 forces
2024-10-03 10:05:18.927 INFO: Total number of configurations: train=100, valid=1000, tests=[Default_Default: 1000],
2024-10-03 10:05:18.927 INFO: ==================Using multiheads finetuning mode==================
2024-10-03 10:05:18.928 INFO: Using foundation model for multiheads finetuning with ../../../transferability7k/training.xyz
2024-10-03 10:05:21.214 INFO: Training set [7642 configs, 7642 energy, 380589 forces] loaded from '../../../transferability7k/training.xyz'
2024-10-03 10:05:21.689 INFO: Validation set [1000 configs, 1000 energy, 46593 forces] loaded from '../../../transferability7k/validation.xyz'
2024-10-03 10:05:21.689 INFO: Total number of configurations: train=7642, valid=1000
2024-10-03 10:05:21.719 INFO: Atomic Numbers used: [74]
2024-10-03 10:05:21.720 INFO: Isolated Atomic Energies (E0s) not in training file, using command line argument
Traceback (most recent call last):
File "/home/mncui/software/miniconda3/envs/mace_foundation/bin/mace_run_train", line 8, in <module>
sys.exit(main())
File "/work/home/mncui/software/mace_main10_2024/mace/cli/run_train.py", line 63, in main
run(args)
File "/work/home/mncui/software/mace_main10_2024/mace/cli/run_train.py", line 356, in run
atomic_energies_dict[head_config.head_name] = {
File "/work/home/mncui/software/mace_main10_2024/mace/cli/run_train.py", line 357, in <dictcomp>
z: model_foundation.atomic_energies_fn.atomic_energies[
IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number
MACE_model_newrun-2024_debug.log
Thanks again and hope these info can help.
Could send your input script, a small sample of your data and your model at ib467@cam.ac.uk so I can reproduce that myself. Also how are you parsing your E0s?
Hi, hope the email fine you, the E0s were set inside the script, there is only one element in the datasets "Tungsten"
by the way, the E0s for pretrained models are set all the same in the input script but from DFTB calculation {74: -29.330717613489064}
, as you can find, there are dftb_
tagged energy&forces inside all the data as well.
the E0s need to be calculated with the same method as the data you are fitting
Yes, that's so I have dftb E0s for pretraining on dftb, the dft E0s for finetuning on dft.
@MengnanCui I should have fixed that in the main branch. Could you try and tell me if it is fixed indeed.
Hi, Ilyes. I submitted a job, it worked very well until now. The bug is fixed. Thank you very much!
Descirbe the bug
Hi, I want to do multihead finetuning on personal pre-trained model(
ptbp_model.model
based on version 0.3.7, main branch), after editing these commands--foundation_model='../ptbp_model.model'
--multiheads_finetuning=True
--pt_train_file='../../../transferability7k/training.xyz'
--pt_valid_file='../../../transferability7k/validation.xyz'
I got error messages as the following, do you have ideas about this problem? Do I have to use the latest code to pre-train a model, then fine-tune with multihead approach? for your information, the pretrained model based on
dftb_
key, then finetuning ondft_
.Here is the log file MACE_model_run-2747_debug.log