[BUG] _Different result between `dp --pt test` and `DeepEval` for a fine-tuned multi-task model

SchrodingersCattt commented 1 week ago

Bug summary

I finetuned a model based on pre-trained model of 2024Q1 from aissq, with branch of Domains_OC2M. My new datasets are labeled by CP2K 2024.1 (units have been correctedly converted to eV for energies and eV/ang. for forces) for my experimentally-observed structure. The command for finetune is :

nohup dp --pt train train.json --finetune OpenLAM_2.1.0_27heads_2024Q1.pt --model-branch Domains_OC2M> log 2>&1 && dp --pt freeze

Via dp --pt test command, I got good consistency between DFT labels and DP predictions: img_v3_02c3_5c8188d5-5fb8-4d0e-ad9b-a62ec0c5d48g

However, I noticed a significant discrepancy between the relabeled data and the original DFT-labeled data with Python API using DeepEval module. img_v3_02c3_2380fc34-8850-4caf-a89f-5dc97ebff65g

I have attached my datasets, models, input files, and relevant scripts. Please review them if needed.

Thank you for your time!

DeePMD-kit Version

DeePMD-kit v3.0.0a1.dev82+geed7c8ac

Backend and its version

torch 2.0.0+cu118

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

In dataset.zip, folder train and valid are DFT-labeled, while train_relabeled and valid_relabeled are finetuned model-relabeled.

In scripts.zip, relabel.py was executed after freezing the model; plot_test.py was executed after dp --pt test; plot_relabel.py was executed after relabel. ( See "Steps to Reproduce" for details)

train.json is my input.

My finetuned model can be accessed here (uploaded to my repo due to the size limit...): https://github.com/SchrodingersCattt/deepmd-issue-20240622/blob/main/frozen_model.pth

Steps to Reproduce

Result of `dp --pt test`

Run dp --pt test -m frozen_model.pth -s train -d test then run python plot_test.py. This should generate an e_f.png.

Result of `DeepEval`

Run python relabel.py then run python plot_relabel.py. This should generate a dft_vs_relabel.png.

Further Information, Files, and Links

No response

njzjz commented 1 week ago

Duplicate of #3884

SchrodingersCattt commented 1 week ago

I finally solved the problem and share my experience here.

The atype of model obtained from multi-task training should be the same with the pre-training, i.e. the 118-element type_map, INSTEAD OF the type_map wrote in input.json. That's why I run my scripts all well in single-task pretrained models but fail here in multi-task pretrained models !

deepmodeling / deepmd-kit