deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.52k stars 516 forks source link

The torch backend compress error #4416

Closed Jeremy1189 closed 2 days ago

Jeremy1189 commented 4 days ago

Bug summary

I tested the scratch training with se_atten_v2 and dpa2 using Deepmd-kit v3.0.0.0, and both models successfully compressed. However, after fine-tuning the dpa2 descriptor, I distilled and obtained a student model from the fine-tuned model. Although the downloaded student model can freeze properly, it cannot be compressed using the command (dp --pt compress) following the instructions from the tutorial (https://bohrium.dp.tech/notebooks/16449433825?utm_source=weixin&utm_medium=weixin&utm_campaign=article&utm_term=jc1124&test=aaa).

DeePMD-kit Version

3.0.0

Backend and its version

pytorch

How did you download the software?

Offline packages

Input Files, Running Commands, Error Log, etc.

(base) root@bohrium-12166-1226011:/personal/dpa2_hea/version11/result_distillation/no_data_stat_nbatch/task.0000# dp --pt freeze To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-25 13:30:01,481] DEEPMD INFO DeePMD version: 3.0.0 [2024-11-25 13:30:01,733] DEEPMD WARNING The rcut goes beyond table upper boundary, performing extrapolation. [2024-11-25 13:30:02,859] DEEPMD INFO Saved frozen model to frozen_model.pth (base) root@bohrium-12166-1226011:/personal/dpa2_hea/version11/result_distillation/no_data_stat_nbatch/task.0000# dp --pt compress To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-25 13:30:11,496] DEEPMD INFO DeePMD version: 3.0.0 Traceback (most recent call last): File "/opt/deepmd-kit/bin/dp", line 10, in sys.exit(main()) ^^^^^^ File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/main.py", line 927, in main deepmd_main(args) File "/opt/deepmd-kit/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 562, in main enable_compression( File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/compress.py", line 41, in enable_compression model_def_script = json.loads(saved_model.model_def_script) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/deepmd-kit/lib/python3.12/json/init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/deepmd-kit/lib/python3.12/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/deepmd-kit/lib/python3.12/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Steps to Reproduce

finetune->distill->freeze->compress

Further Information, Files, and Links

No response

njzjz commented 3 days ago

Please provide a reproducible example.

Jeremy1189 commented 3 days ago

https://drive.google.com/drive/folders/1CAVaj0sDr0nFNLfAy2LWN3YYW5G6FPHR?usp=sharing, based on the Bohrium platform, and the image is made by the tutorial https://bohrium.dp.tech/notebooks/16449433825?utm_source=weixin&utm_medium=weixin&utm_campaign=article&utm_term=jc1124&test=aaa

Jeremy1189 commented 3 days ago

after adding model_def_script to ZBL https://github.com/deepmodeling/deepmd-kit/pull/4423, the decode error disappeared, but there is still an error during the compress process as follows:

(base) root@bohrium-12166-1226011:/personal/dpa2_hea/version11/result_distillation/test_torch_zbl/out/tmp/outputs/artifacts/model/task.0003# dp --pt compress To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-26 19:24:58,801] DEEPMD INFO DeePMD version: 3.0.0 [2024-11-26 19:24:59,119] DEEPMD WARNING The rcut goes beyond table upper boundary, performing extrapolation. [2024-11-26 19:24:59,476] DEEPMD INFO training data with lower boundary: [[-0.48768258 -0. -0. -0. ] [-0.489019 -0. -0. -0. ] [-0.48695466 -0. -0. -0. ] [-0.49220949 -0. -0. -0. ] [-0.49063556 -0. -0. -0. ] [-0.48726004 -0. -0. -0. ] [-0.48860153 -0. -0. -0. ]] [2024-11-26 19:24:59,476] DEEPMD INFO training data with upper boundary: [[5.19521029 8.84705745 8.84705745 8.84705745] [5.30036888 9.0080993 9.0080993 9.0080993 ] [5.15054662 8.77890949 8.77890949 8.77890949] [5.03038694 8.58215269 8.58215269 8.58215269] [5.05160577 8.61804444 8.61804444 8.61804444] [5.11572299 8.72410685 8.72410685 8.72410685] [5.04800198 8.61619421 8.61619421 8.61619421]] Traceback (most recent call last): File "/opt/deepmd-kit/bin/dp", line 10, in sys.exit(main()) ^^^^^^ File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/main.py", line 927, in main deepmd_main(args) File "/opt/deepmd-kit/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 562, in main enable_compression( File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/compress.py", line 76, in enable_compression model.enable_compression( File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/model/model/make_model.py", line 121, in enable_compression self.atomic_model.enable_compression( File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/model/atomic_model/linear_atomic_model.py", line 210, in enable_compression model.enable_compression( File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/dpmodel/atomic_model/make_base_atomic_model.py", line 174, in enable_compression raise NotImplementedError("This atomi model doesn't support compression!")