deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.45k stars 499 forks source link

Water training error #669

Closed Mskater closed 3 years ago

Mskater commented 3 years ago

Hello again, I'm a beginner grad student and got the following problem. (CPU version, Ubuntu 18.04 ) I ran $ dp train water_se_a.json, and this error occurred:
Thanks for any help!

~/deepmd_root/lib/python3.8/site-packages/deepmd/tests$ dp train water_se_a.json WARNING:tensorflow:From /home/marie/miniconda3/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0

DEEPMD: ____

DEEPMD: | \ | \ | \/ || _ \ | | ()| |

DEEPMD: | | | | | |) || \ / || | | | ____ | | _ | |_

DEEPMD: | | | | / \ / | _/ | |\/| || | | ||____|| |/ /| || __|

DEEPMD: | || || /| /| | | | | || || | | < | || |_

DEEPMD: |__/ \| _||| || |_||___/ ||_|| __|

DEEPMD:

DEEPMD: Please read and cite:

DEEPMD: Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)

DEEPMD:

DEEPMD: ---Summary of the training---------------------------------------

DEEPMD: installed to: /tmp/pip-req-build-yjnc2t_a/_skbuild/linux-x86_64-3.8/cmake-install

DEEPMD: source : v1.2.1-3-g30922e7

DEEPMD: source brach: HEAD

DEEPMD: source commit: 30922e7

DEEPMD: source commit at: 2020-09-14 22:49:07 +0800

DEEPMD: build float prec: double

DEEPMD: build with tf inc: /home/marie/miniconda3/lib/python3.8/site-packages/tensorflow/include;/home/marie/miniconda3/lib/python3.8/site-packages/tensorflow/include

DEEPMD: build with tf lib:

DEEPMD: running on: marie-Precision-7530

DEEPMD: CUDA_VISIBLE_DEVICES: unset

DEEPMD: num_intra_threads: 0

DEEPMD: num_inter_threads: 0

DEEPMD: -----------------------------------------------------------------

DEEPMD:

2021-05-26 14:43:51.303428: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2021-05-26 14:43:51.328250: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2699905000 Hz 2021-05-26 14:43:51.328837: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5634e795a030 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-05-26 14:43:51.328889: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-05-26 14:43:51.328955: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. Traceback (most recent call last): File "/home/marie/miniconda3/bin/dp", line 10, in sys.exit(main()) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/main.py", line 66, in main train(args) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 81, in train _do_work(jdata, run_opt) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 85, in _do_work model = NNPTrainer (jdata, run_opt = run_opt) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/Trainer.py", line 49, in init self._init_param(jdata) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/Trainer.py", line 115, in _init_param lr_param = j_must_have(jdata, 'learning_rate') File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/common.py", line 149, in j_must_have raise RuntimeError ("json database must provide key " + key ) RuntimeError: json database must provide key learning_rate

njzjz commented 3 years ago

Please provide your input file.

On May 26, 2021, at 14:51, Mskater @.***> wrote:

 Hello again, I'm a beginner grad student and got the following problem. (CPU version, Ubuntu 18.04 ) I ran $ dp train water_se_a.json, and this error occurred: Thanks for any help!

~/deepmd_root/lib/python3.8/site-packages/deepmd/tests$ dp train water_se_a.json WARNING:tensorflow:From /home/marie/miniconda3/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0

DEEPMD: ____

DEEPMD: | \ | \ | / || _ \ | | ()| |

DEEPMD: | | | | | |) || \ / || | | | ____ | | _ | |_

DEEPMD: | | | | / \ / | / | |/| || | | ||_|| |/ /| || |

DEEPMD: | |_| || /| /| | | | | || || | | < | || |

DEEPMD: |/ | ||| || |||____/ |||| __|

DEEPMD:

DEEPMD: Please read and cite:

DEEPMD: Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)

DEEPMD:

DEEPMD: ---Summary of the training---------------------------------------

DEEPMD: installed to: /tmp/pip-req-build-yjnc2t_a/_skbuild/linux-x86_64-3.8/cmake-install

DEEPMD: source : v1.2.1-3-g30922e7

DEEPMD: source brach: HEAD

DEEPMD: source commit: 30922e7

DEEPMD: source commit at: 2020-09-14 22:49:07 +0800

DEEPMD: build float prec: double

DEEPMD: build with tf inc: /home/marie/miniconda3/lib/python3.8/site-packages/tensorflow/include;/home/marie/miniconda3/lib/python3.8/site-packages/tensorflow/include

DEEPMD: build with tf lib:

DEEPMD: running on: marie-Precision-7530

DEEPMD: CUDA_VISIBLE_DEVICES: unset

DEEPMD: num_intra_threads: 0

DEEPMD: num_inter_threads: 0

DEEPMD: -----------------------------------------------------------------

DEEPMD:

2021-05-26 14:43:51.303428: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2021-05-26 14:43:51.328250: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2699905000 Hz 2021-05-26 14:43:51.328837: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5634e795a030 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-05-26 14:43:51.328889: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-05-26 14:43:51.328955: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. Traceback (most recent call last): File "/home/marie/miniconda3/bin/dp", line 10, in sys.exit(main()) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/main.py", line 66, in main train(args) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 81, in train _do_work(jdata, run_opt) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 85, in _do_work model = NNPTrainer (jdata, run_opt = run_opt) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/Trainer.py", line 49, in init self._init_param(jdata) File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/Trainer.py", line 115, in _init_param lr_param = j_must_have(jdata, 'learning_rate') File "/home/marie/miniconda3/lib/python3.8/site-packages/deepmd/common.py", line 149, in j_must_have raise RuntimeError ("json database must provide key " + key ) RuntimeError: json database must provide key learning_rate

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Mskater commented 3 years ago

Here is the input. I have not changed anything. But this file was the one in the "test" directory. When I ran in the examples/water/train directory it worked. Thanks.

{ "_comment": " model parameters", "model" : { "descriptor" :{ "type": "se_a", "sel": [46, 92], "rcut_smth": 5.80, "rcut": 6.00, "neuron": [25, 50, 100], "resnet_dt": false, "axis_neuron": 16, "seed": 1 }, "fitting_net" : { "neuron": [240, 240, 240], "resnet_dt": true, "seed": 1 } },

"_comment": " traing controls",
"systems":      ["system"],
"set_prefix":   "set",    
"stop_batch":   1000000,
"batch_size":   1,
"start_lr":     0.005,
"decay_steps":  5000,
"decay_rate":   0.95,

"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0,

"seed":     1,

"_comment": " display and restart",
"_comment": " frequencies counted in batch",
"disp_file":    "lcurve.out",
"disp_freq":    100,
"numb_test":    1,
"save_freq":    1000,
"save_ckpt":    "model.ckpt",
"load_ckpt":    "model.ckpt",
"disp_training":    true,
"time_training":    true,
"profiling":    false,
"profiling_file":   "timeline.json",

"_comment":     "that's all"

}

njzjz commented 3 years ago

This file is not used to run.