Code for the paper "Attention-Based Deep Learning Framework for Human Activity Recognition with User Adaptation", Buffelli D., Vandin F., IEEE Sensors Journal, 2021.
Other
15
stars
6
forks
source link
ERROR:tensorflow:Model diverged with loss = NaN. #2
Hi,
I preprocessed HHAR DataSet with https://github.com/DavideBuffelli/A-Deep-Learning-Model-for-Personalised-Human-Activity-Recognition/tree/master/pre-processing. But when I started to execute test_trasend.py , I got the ERROR below.
I had tried multiple way to solve the ERROR like changing the learning rate , however it didn't work.
Once I "comment " training estimator at line 56, =it work, but the FI-Score was not correct (about 0.2).
Could you help me to check what the problem I met? thank you.
----- Training and evaluating for User: g
WARNING:tensorflow:Estimator's model_fn (<function Model.get_model_function..model_fn at 0x7fe828ef3e18>) includes params argument, but params are not passed to Estimator.
2023-06-20 01:11:37.119537: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2023-06-20 01:11:53.428380: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 13610 of 117590
2023-06-20 01:12:03.427897: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 27690 of 117590
2023-06-20 01:12:13.428012: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 41574 of 117590
2023-06-20 01:12:23.428087: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 55074 of 117590
2023-06-20 01:12:33.427806: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 68822 of 117590
2023-06-20 01:12:43.427861: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 82709 of 117590
2023-06-20 01:12:53.427973: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 96477 of 117590
2023-06-20 01:13:03.428104: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 110461 of 117590
2023-06-20 01:13:08.490735: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:135] Shuffle buffer filled.
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
File "test_trasend.py", line 56, in
trasend_estimator.train(training_input_function)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1173, in _train_model_default
saving_listeners)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1451, in _train_with_estimatorspec
, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 583, in run
run_metadata=run_metadata)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1059, in run
run_metadata=run_metadata)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1150, in run
raise six.reraise(original_exc_info)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/six.py", line 719, in reraise
raise value
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1135, in run
return self._sess.run(args, **kwargs)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1215, in run
run_metadata=run_metadata))
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 635, in after_run
raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
Hi, unfortunately I am not able to replicate this error. Could I ask you which version of tensor flow you are using? and have you tried looking at the data (in particular if there are any NaNs in there)?
Hi, I preprocessed HHAR DataSet with https://github.com/DavideBuffelli/A-Deep-Learning-Model-for-Personalised-Human-Activity-Recognition/tree/master/pre-processing. But when I started to execute test_trasend.py , I got the ERROR below. I had tried multiple way to solve the ERROR like changing the learning rate , however it didn't work. Once I "comment " training estimator at line 56, =it work, but the FI-Score was not correct (about 0.2). Could you help me to check what the problem I met? thank you.
----- Training and evaluating for User: g WARNING:tensorflow:Estimator's model_fn (<function Model.get_model_function..model_fn at 0x7fe828ef3e18>) includes params argument, but params are not passed to Estimator.
2023-06-20 01:11:37.119537: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2023-06-20 01:11:53.428380: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 13610 of 117590
2023-06-20 01:12:03.427897: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 27690 of 117590
2023-06-20 01:12:13.428012: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 41574 of 117590
2023-06-20 01:12:23.428087: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 55074 of 117590
2023-06-20 01:12:33.427806: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 68822 of 117590
2023-06-20 01:12:43.427861: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 82709 of 117590
2023-06-20 01:12:53.427973: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 96477 of 117590
2023-06-20 01:13:03.428104: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 110461 of 117590
2023-06-20 01:13:08.490735: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:135] Shuffle buffer filled.
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
File "test_trasend.py", line 56, in
trasend_estimator.train(training_input_function)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1173, in _train_model_default
saving_listeners)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1451, in _train_with_estimatorspec
, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 583, in run
run_metadata=run_metadata)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1059, in run
run_metadata=run_metadata)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1150, in run
raise six.reraise(original_exc_info)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/six.py", line 719, in reraise
raise value
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1135, in run
return self._sess.run(args, **kwargs)
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1215, in run
run_metadata=run_metadata))
File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 635, in after_run
raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.