Closed Vibsteamer closed 3 years ago
In fact, it's a miss use of the training script. I changed the parameter model[fitting_net][neuron] from 240 to 120 in input.json, all problems solved. Please check the parameter and rerun the program.
In fact, it's a miss use of the training script. I changed the parameter model[fitting_net][neuron] from 240 to 120 in input.json, all problems solved. Please check the parameter and rerun the program.
Thank you for your reply. But the fitting net has never been changed, always being [240.240.240].
Can you get a normal result following the reproduce procedure, using the input files and version of codes described above? :
1). train several steps a model with the original input and kit-1.3 2). convert it to new_model using kit-2.0 beta3 3). compress new_model using the manually_revised_for_strict_check input,json and kit-2.0 beta3
In fact, it's a miss use of the training script. I changed the parameter model[fitting_net][neuron] from 240 to 120 in input.json, all problems solved. Please check the parameter and rerun the program.
Thank you for your reply. But the fitting net has never been changed, always being [240.240.240].
Can you get a normal result following the reproduce procedure, using the input files and version of codes described above? :
Steps to Reproduce
1). train several steps a model with the original input and kit-1.3 2). convert it to new_model using kit-2.0 beta3 3). compress new_model using the manually_revised_for_strict_check input,json and kit-2.0 beta3
root ISSUE-846 $ dp -h
WARNING:tensorflow:From /root/dp-master/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
usage: dp [-h] {transform,train,freeze,test,convert-to,doc-train-input} ...
DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics
optional arguments: -h, --help show this help message and exit
Valid subcommands: {transform,train,freeze,test,convert-to,doc-train-input} transform pass parameters to another model train train a model freeze freeze the model test test the model convert-to convert dp-1.3 model to higher model compatibility doc-train-input print the documentation (in rst format) of input training parameters. root ISSUE-846 $ dp train input_1.3.json WARNING:tensorflow:From /root/dp-master/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term
- freeze the model by the 1.3 dp:
root ISSUE-846 $ dp freeze -o 1.3.pb
WARNING:tensorflow:From /root/dp-master/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
The following nodes will be frozen: o_energy,o_force,o_virial,o_atom_energy,o_atom_virial,descrpt_attr/rcut,descrpt_attr/ntypes,fitting_attr/dfparam,fitting_attr/daparam,model_attr/tmap,model_attr/model_type
WARNING:tensorflow:From /root/dp-master/tensorflow_venv/lib/python3.6/site-packages/deepmd/freeze.py:79: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /root/dp-master/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/framework/convert_to_constants.py:856: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
1109 ops in the final graph.
- convert the model from 1.3 to 2.0 by the 2.0 dp:
root ISSUE-846 $ dp convert-from 1.3 -i 1.3.pb -o 2.0.pb WARNING:tensorflow:From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:tensorflow:From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/deepmd/utils/convert.py:24: FastGFile.init (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version. Instructions for updating: Use tf.gfile.GFile. DEEPMD WARNING From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/deepmd/utils/convert.py:24: FastGFile.init (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version. Instructions for updating: Use tf.gfile.GFile. the converted output model (2.0 support) is saved in 2.0.pb
- and compress the converted model by the 2.0 dp:
root ISSUE-846 $ dp compress out.json -i 2.0.pb -o 2.0-compress.pb WARNING:tensorflow:From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term DEEPMD INFO
DEEPMD INFO stage 1: train or refine the model with tabulation DEEPMD INFO __ DEEPMD INFO | \ | \ | \/ || \ | | (_)| | DEEPMD INFO | | | | | |) || \ / || | | | ____ | | | | DEEPMD INFO | | | | / \ / | _/ | |\/| || | | ||____|| |/ /| || | DEEPMD INFO | || || /| /| | | | | || || | | < | || |_ DEEPMD INFO |___/ _| _||| || |_||___/ ||_|| __| DEEPMD INFO Please read and cite: DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018) DEEPMD INFO installed to: /tmp/pip-req-build-cmjfgch8/_skbuild/linux-x86_64-3.6/cmake-install DEEPMD INFO source : v1.2.2-770-g12b0bd6 DEEPMD INFO source brach: devel DEEPMD INFO source commit: 12b0bd6 DEEPMD INFO source commit at: 2021-07-28 10:06:20 +0800 DEEPMD INFO build float prec: double DEEPMD INFO build with tf inc: /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/tensorflow/include;/root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/tensorflow/include DEEPMD INFO build with tf lib: DEEPMD INFO ---Summary of the training--------------------------------------- DEEPMD INFO running on: iZ2zeedzsx4jorjze9gyq7Z DEEPMD INFO CUDA_VISIBLE_DEVICES: unset DEEPMD INFO num_intra_threads: 0 DEEPMD INFO num_inter_threads: 0 DEEPMD INFO ----------------------------------------------------------------- DEEPMD INFO ---Summary of DataSystem: training ----------------------------------------------- DEEPMD INFO found 3 system(s): DEEPMD INFO system natoms bch_sz n_bch prob pbc DEEPMD INFO -- bug-fix-related/ISSUE-846/data/init.997 32 1 10 0.333 T DEEPMD INFO -- bug-fix-related/ISSUE-846/data/init.998 32 1 10 0.333 T DEEPMD INFO -- bug-fix-related/ISSUE-846/data/init.999 32 1 10 0.333 T DEEPMD INFO -------------------------------------------------------------------------------------- DEEPMD INFO training without frame parameter DEEPMD INFO training data with min nbor dist: 3.092825818114446 DEEPMD INFO training data with max nbor size: [21, 12, 26] DEEPMD INFO training data with lower boundary: -0.13109273996841408 DEEPMD INFO training data with upper boundary: 8.947001116142355 DEEPMD INFO built lr DEEPMD INFO built network DEEPMD INFO built training DEEPMD INFO initialize model from scratch DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 100, decay_rate 0.562341, final lr will be 1.00e-08 DEEPMD INFO batch 2000 training time 9.80 s, testing time 0.00 s DEEPMD INFO saved checkpoint model.ckpt DEEPMD INFO finished training DEEPMD INFO wall time: 12.211 s DEEPMD INFO
DEEPMD INFO stage 2: freeze the model
INFO:tensorflow:Restoring parameters from /root/dp-bug-fix-related/ISSUE-846/model.ckpt
DEEPMD INFO Restoring parameters from /root/dp-bug-fix-related/ISSUE-846/model.ckpt
The following nodes will be frozen: ['descrpt_attr/rcut', 'descrpt_attr/ntypes', 'model_attr/tmap', 'model_attr/model_type', 'model_attr/model_version', 'o_energy', 'o_force', 'o_virial', 'o_atom_energy', 'o_atom_virial', 'fitting_attr/dfparam', 'fitting_attr/daparam']
WARNING:tensorflow:From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/deepmd/entrypoints/freeze.py:183: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
DEEPMD WARNING From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/deepmd/entrypoints/freeze.py:183: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/framework/convert_to_constants.py:856: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
DEEPMD WARNING From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/framework/convert_to_constants.py:856: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
792 ops in the final graph.
DEEPMD INFO
DEEPMD INFO stage 3: transfer the model DEEPMD INFO 792 ops in the raw graph DEEPMD INFO 1110 ops in the old graph DEEPMD INFO descrpt_attr/t_avg is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO descrpt_attr/t_std is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_0_type_0/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_0_type_0/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_0/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_0/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_0/idt is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_0/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_0/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_0/idt is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO final_layer_type_0/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO final_layer_type_0/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_0_type_1/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_0_type_1/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_1/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_1/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_1/idt is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_1/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_1/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_1/idt is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO final_layer_type_1/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO final_layer_type_1/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_0_type_2/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_0_type_2/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_2/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_2/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_1_type_2/idt is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_2/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_2/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO layer_2_type_2/idt is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO final_layer_type_2/matrix is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO final_layer_type_2/bias is passed from old graph(<class 'numpy.float64'>) to raw graph(<class 'numpy.float64'>) DEEPMD INFO the output model is saved in 2.0-compress.pb
- compare the dp-test results of the two models:
root ISSUE-846 $ dp test -s data -m 2.0.pb WARNING:tensorflow:From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : data/init.998 DEEPMD INFO # number of test data : 10 DEEPMD INFO Energy RMSE : 5.110137e-01 eV DEEPMD INFO Energy RMSE/Natoms : 1.596918e-02 eV DEEPMD INFO Force RMSE : 1.223216e-02 eV/A DEEPMD INFO Virial RMSE : 2.932328e+01 eV DEEPMD INFO Virial RMSE/Natoms : 9.163526e-01 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : data/init.999 DEEPMD INFO # number of test data : 10 DEEPMD INFO Energy RMSE : 3.589722e-01 eV DEEPMD INFO Energy RMSE/Natoms : 1.121788e-02 eV DEEPMD INFO Force RMSE : 1.296819e-02 eV/A DEEPMD INFO Virial RMSE : 2.883325e+01 eV DEEPMD INFO Virial RMSE/Natoms : 9.010390e-01 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : data/init.997 DEEPMD INFO # number of test data : 10 DEEPMD INFO Energy RMSE : 5.086544e-01 eV DEEPMD INFO Energy RMSE/Natoms : 1.589545e-02 eV DEEPMD INFO Force RMSE : 1.187290e-02 eV/A DEEPMD INFO Virial RMSE : 2.975450e+01 eV DEEPMD INFO Virial RMSE/Natoms : 9.298282e-01 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ----------weighted average of errors----------- DEEPMD INFO # number of systems : 3 DEEPMD INFO Energy RMSE/Natoms : 1.453181e-02 eV DEEPMD INFO Force RMSE : 1.236616e-02 eV/A DEEPMD INFO Virial RMSE/Natoms : 9.158155e-01 eV DEEPMD INFO # -----------------------------------------------
root ISSUE-846 $ dp test -s data -m 2.0-compress.pb WARNING:tensorflow:From /root/dp-devel/tensorflow_venv/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : data/init.998 DEEPMD INFO # number of test data : 10 DEEPMD INFO Energy RMSE : 5.110137e-01 eV DEEPMD INFO Energy RMSE/Natoms : 1.596918e-02 eV DEEPMD INFO Force RMSE : 1.223216e-02 eV/A DEEPMD INFO Virial RMSE : 2.932328e+01 eV DEEPMD INFO Virial RMSE/Natoms : 9.163526e-01 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : data/init.999 DEEPMD INFO # number of test data : 10 DEEPMD INFO Energy RMSE : 3.589722e-01 eV DEEPMD INFO Energy RMSE/Natoms : 1.121788e-02 eV DEEPMD INFO Force RMSE : 1.296819e-02 eV/A DEEPMD INFO Virial RMSE : 2.883325e+01 eV DEEPMD INFO Virial RMSE/Natoms : 9.010390e-01 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ---------------output of dp test--------------- DEEPMD INFO # testing system : data/init.997 DEEPMD INFO # number of test data : 10 DEEPMD INFO Energy RMSE : 5.086544e-01 eV DEEPMD INFO Energy RMSE/Natoms : 1.589545e-02 eV DEEPMD INFO Force RMSE : 1.187290e-02 eV/A DEEPMD INFO Virial RMSE : 2.975450e+01 eV DEEPMD INFO Virial RMSE/Natoms : 9.298282e-01 eV DEEPMD INFO # ----------------------------------------------- DEEPMD INFO # ----------weighted average of errors----------- DEEPMD INFO # number of systems : 3 DEEPMD INFO Energy RMSE/Natoms : 1.453181e-02 eV DEEPMD INFO Force RMSE : 1.236616e-02 eV/A DEEPMD INFO Virial RMSE/Natoms : 9.158155e-01 eV DEEPMD INFO # -----------------------------------------------
Here's the input_1.3.json:
{
"model": {
"descriptor": {
"type": "se_a",
"sel": [
300,
300,
300
],
"rcut_smth": 2.0,
"rcut": 6.0,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 12,
"type_one_side": true,
"seed": 2687978781
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true,
"seed": 2706322555
},
"type_map": [
"Mg",
"Al",
"Cu"
]
},
"learning_rate": {
"type": "exp",
"start_lr": 0.001,
"decay_steps": 80000
},
"loss": {
"start_pref_e": 0.02,
"limit_pref_e": 2,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0.0,
"limit_pref_v": 0.0
},
"training": {
"systems": [
"/root/dp-bug-fix-related/ISSUE-846/data/init.997",
"/root/dp-bug-fix-related/ISSUE-846/data/init.998",
"/root/dp-bug-fix-related/ISSUE-846/data/init.999"
],
"set_prefix": "set",
"stop_batch": 16000000,
"batch_size": "auto",
"seed": 1520843097,
"_comment": "that's all",
"disp_file": "lcurve.out",
"disp_freq": 2000,
"numb_test": 1,
"save_freq": 2000,
"save_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json"
}
}
and the out.json:
{
"model": {
"descriptor": {
"type": "se_e2_a",
"sel": [
300,
300,
300
],
"rcut_smth": 2.0,
"rcut": 6.0,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 12,
"type_one_side": true,
"seed": 2687978781,
"activation_function": "tanh",
"precision": "float64",
"trainable": true,
"exclude_types": [],
"set_davg_zero": false
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true,
"seed": 2706322555,
"type": "ener",
"numb_fparam": 0,
"numb_aparam": 0,
"activation_function": "tanh",
"precision": "float64",
"trainable": true,
"rcond": 0.001,
"atom_ener": []
},
"type_map": [
"Mg",
"Al",
"Cu"
],
"data_stat_nbatch": 10,
"data_stat_protect": 0.01
},
"learning_rate": {
"type": "exp",
"start_lr": 0.001,
"decay_steps": 80000,
"stop_lr": 1e-08
},
"loss": {
"start_pref_e": 0.02,
"limit_pref_e": 2,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0.0,
"limit_pref_v": 0.0,
"type": "ener",
"start_pref_ae": 0.0,
"limit_pref_ae": 0.0
},
"training": {
"seed": 1520843097,
"disp_file": "lcurve.out",
"disp_freq": 2000,
"numb_test": 1,
"save_freq": 2000,
"save_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json",
"training_data": {
"systems": [
"/root/dp-bug-fix-related/ISSUE-846/data/init.997",
"/root/dp-bug-fix-related/ISSUE-846/data/init.998",
"/root/dp-bug-fix-related/ISSUE-846/data/init.999"
],
"set_prefix": "set",
"batch_size": "auto",
"auto_prob": "prob_sys_size",
"sys_probs": null
},
"numb_steps": 16000000,
"validation_data": null,
"tensorboard": false,
"tensorboard_log_dir": "log"
}
}
We may need #727 to avoid such things.
This is a bug (or unexpected breaking) in v1.3. model/fitting_net/n_neuron
is not added to the alias of model/fitting_net/neuron
in dargs, so it fallbacks to the default value [120,120,120]
. @Vibsteamer you may need to check all of your previous model trained by v1.3.
This is a bug (or unexpected breaking) in v1.3.
model/fitting_net/n_neuron
is not added to the alias ofmodel/fitting_net/neuron
in dargs, so it fallbacks to the default value[120,120,120]
. @Vibsteamer you may need to check all of your previous model trained by v1.3.
Thank you for your kind reply and endeavor.
and, LOL, I will check those previous models.
It seems the usage of deepmd-kit_v1.3 with dpgen may also encounter the similar issue. Maybe, some data sets generated in this way would be more or less redundant, especially for complex systems involving multiple elements/phases.
Best,
Time is a big nougat.
Summary Using 2.0 beta3 to convert a version-1.3 model, then compress it together with the original training script "input,json" (manually revised to fitting the "strict" check in 2.0), raise dim mismatch between the old(converted) graph and the raw graph.
Deepmd-kit version, installation way, input file, running commands, error log, etc.
for training the version-1.3 model : (the image name on ALI is kit-1.3.0, though source writes 1.2.2 )
input file (the part related to net size):
for convert
convert command: both tried, resulting the same error message after compress
for compress:
command:
part of the input.json (has to be manually revised from the original on above to satisfy the "strict" format check in 2.0)
ERROR MESSAGE when compress:
platform: ALI
Steps to Reproduce
1). train several steps a model with the original input and kit-1.3 2). convert it to new_model using kit-2.0 beta3 3). compress new_model using the manually_revised_for_strict_check input,json and kit-2.0 beta3
NOTE: -->train several steps a model with the manually_revised_for_strict_check input.json and kit-2.0 beta3, then compress it will be fine.
Further Information, Files, and Links input.json(1.3) input.json(manually revised from 1.3 for the strict check in 2.0) frozen_model.pb(1.3) graph.000.new.pb(converted) files.zip
URL for data: (please match the structure of directories with those written in two input.json, when reproducing) http://dplibrary.deepmd.net/#/project_details?project_id=202010.002
WARNING: Please leave few training data (delete most of them and correspondingly revise the input.json) when reproducing. Or the complete data sets will consume lost of your life when initializing the training, especially when using rotating disks.