deepmodeling / dpgen2

2nd generation of the Deep Potential GENerator
https://docs.deepmodeling.com/projects/dpgen2/
GNU Lesser General Public License v3.0
33 stars 26 forks source link

The dp compress with pytorch backend can not do the compression (finetune-->distillation based on deepmd-kit-v3-b3, dpgen2) #274

Open Jeremy1189 opened 1 week ago

Jeremy1189 commented 1 week ago

we use the Pytorch backend for the distillation process, and download the model at /prep-run-train/output/models/task.0000 by "dpgen2 download ..." command, then get a model: model.ckpt.pt, and frozen it by the " dp --pt freeze -o model.pth" command (need a manual add checkpoint file) and obtain the model.pth. However, this model.pth cannot compress by the Pytorch compression command "dp compress -i model.pth -o model-compress.pth", and it gives the following error message:

root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# dp compress -i model.pth -o model_compress.pth 2024-11-12 14:48:51.537628: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-12 14:48:51.537679: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-12 14:48:51.537696: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-12 14:48:51.544735: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. WARNING:tensorflow:From /opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:108: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. WARNING:tensorflow:disable_mixed_precision_graph_rewrite() called when mixed precision is already disabled. Traceback (most recent call last): File "/opt/deepmd-kit-3.0.0b3/bin/dp", line 10, in sys.exit(main()) File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/main.py", line 923, in main deepmd_main(args) File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/entrypoints/main.py", line 81, in main compress(dictargs) File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/entrypoints/compress.py", line 98, in compress graph, = load_graph_def(input) File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/tf/utils/graph.py", line 42, in load_graph_def graph_def.ParseFromString(f.read()) google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'**

Jeremy1189 commented 1 week ago

dp --pt compress -i model.pth -o compress_model.pth also not work root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# ls checkpoint model.ckpt.pt model.pth root@bohrium-12166-1204587:/personal/dpa2_hea/version10/distill_stable/iter-000002/prep-run-train/output/models/task.0000# dp --pt compress -i model.pth -o compress_model.pth To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information. [2024-11-12 21:39:04,666] DEEPMD INFO DeePMD version: 3.0.0b3 Traceback (most recent call last): File "/opt/deepmd-kit-3.0.0b3/bin/dp", line 10, in sys.exit(main()) File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/main.py", line 923, in main deepmd_main(args) File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/opt/deepmd-kit-3.0.0b3/lib/python3.10/site-packages/deepmd/pt/entrypoints/main.py", line 577, in main raise RuntimeError(f"Invalid command {FLAGS.command}!") RuntimeError: Invalid command compress!

Jeremy1189 commented 1 week ago

The setting of descriptor during distillation with the "attn_layer": 0 "descriptor": { "type": "se_atten_v2", "sel": 120, "rcut_smth": 0.50, "rcut": 6.00, "neuron": [ 25, 50, 100 ], "resnet_dt": false, "axis_neuron": 16, "seed": 1, "attn": 128, "attn_layer": 0, "attn_dotr": true, "attn_mask": false, "precision": "float64", "_comment2": " that's all"

wanghan-iapcm commented 1 week ago

b3 only supports compression with tf backend.

Jeremy1189 commented 1 week ago

Whether the b4 supports compression with pytorch backend?