Closed Jeremy1189 closed 2 days ago
Please provide a reproducible example.
https://drive.google.com/drive/folders/1CAVaj0sDr0nFNLfAy2LWN3YYW5G6FPHR?usp=sharing, based on the Bohrium platform, and the image is made by the tutorial https://bohrium.dp.tech/notebooks/16449433825?utm_source=weixin&utm_medium=weixin&utm_campaign=article&utm_term=jc1124&test=aaa
after adding model_def_script to ZBL https://github.com/deepmodeling/deepmd-kit/pull/4423, the decode error disappeared, but there is still an error during the compress process as follows:
(base) root@bohrium-12166-1226011:/personal/dpa2_hea/version11/result_distillation/test_torch_zbl/out/tmp/outputs/artifacts/model/task.0003# dp --pt compress
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2024-11-26 19:24:58,801] DEEPMD INFO DeePMD version: 3.0.0
[2024-11-26 19:24:59,119] DEEPMD WARNING The rcut goes beyond table upper boundary, performing extrapolation.
[2024-11-26 19:24:59,476] DEEPMD INFO training data with lower boundary: [[-0.48768258 -0. -0. -0. ]
[-0.489019 -0. -0. -0. ]
[-0.48695466 -0. -0. -0. ]
[-0.49220949 -0. -0. -0. ]
[-0.49063556 -0. -0. -0. ]
[-0.48726004 -0. -0. -0. ]
[-0.48860153 -0. -0. -0. ]]
[2024-11-26 19:24:59,476] DEEPMD INFO training data with upper boundary: [[5.19521029 8.84705745 8.84705745 8.84705745]
[5.30036888 9.0080993 9.0080993 9.0080993 ]
[5.15054662 8.77890949 8.77890949 8.77890949]
[5.03038694 8.58215269 8.58215269 8.58215269]
[5.05160577 8.61804444 8.61804444 8.61804444]
[5.11572299 8.72410685 8.72410685 8.72410685]
[5.04800198 8.61619421 8.61619421 8.61619421]]
Traceback (most recent call last):
File "/opt/deepmd-kit/bin/dp", line 10, in
Bug summary
I tested the scratch training with se_atten_v2 and dpa2 using Deepmd-kit v3.0.0.0, and both models successfully compressed. However, after fine-tuning the dpa2 descriptor, I distilled and obtained a student model from the fine-tuned model. Although the downloaded student model can freeze properly, it cannot be compressed using the command (dp --pt compress) following the instructions from the tutorial (https://bohrium.dp.tech/notebooks/16449433825?utm_source=weixin&utm_medium=weixin&utm_campaign=article&utm_term=jc1124&test=aaa).
DeePMD-kit Version
3.0.0
Backend and its version
pytorch
How did you download the software?
Offline packages
Input Files, Running Commands, Error Log, etc.
(base) root@bohrium-12166-1226011:/personal/dpa2_hea/version11/result_distillation/no_data_stat_nbatch/task.0000# dp --pt freeze To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-25 13:30:01,481] DEEPMD INFO DeePMD version: 3.0.0 [2024-11-25 13:30:01,733] DEEPMD WARNING The rcut goes beyond table upper boundary, performing extrapolation. [2024-11-25 13:30:02,859] DEEPMD INFO Saved frozen model to frozen_model.pth (base) root@bohrium-12166-1226011:/personal/dpa2_hea/version11/result_distillation/no_data_stat_nbatch/task.0000# dp --pt compress To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-25 13:30:11,496] DEEPMD INFO DeePMD version: 3.0.0 Traceback (most recent call last): File "/opt/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/main.py", line 927, in main
deepmd_main(args)
File "/opt/deepmd-kit/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 562, in main
enable_compression(
File "/opt/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/compress.py", line 41, in enable_compression
model_def_script = json.loads(saved_model.model_def_script)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.12/json/init.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.12/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.12/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Steps to Reproduce
finetune->distill->freeze->compress
Further Information, Files, and Links
No response