DavideBuffelli / TrASenD

Code for the paper "Attention-Based Deep Learning Framework for Human Activity Recognition with User Adaptation", Buffelli D., Vandin F., IEEE Sensors Journal, 2021.
Other
15 stars 6 forks source link

ERROR:tensorflow:Model diverged with loss = NaN. #2

Open Ch0pperX opened 1 year ago

Ch0pperX commented 1 year ago

Hi, I preprocessed HHAR DataSet with https://github.com/DavideBuffelli/A-Deep-Learning-Model-for-Personalised-Human-Activity-Recognition/tree/master/pre-processing. But when I started to execute test_trasend.py , I got the ERROR below. I had tried multiple way to solve the ERROR like changing the learning rate , however it didn't work. Once I "comment " training estimator at line 56, =it work, but the FI-Score was not correct (about 0.2). Could you help me to check what the problem I met? thank you.

----- Training and evaluating for User: g WARNING:tensorflow:Estimator's model_fn (<function Model.get_model_function..model_fn at 0x7fe828ef3e18>) includes params argument, but params are not passed to Estimator. 2023-06-20 01:11:37.119537: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2023-06-20 01:11:53.428380: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 13610 of 117590 2023-06-20 01:12:03.427897: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 27690 of 117590 2023-06-20 01:12:13.428012: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 41574 of 117590 2023-06-20 01:12:23.428087: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 55074 of 117590 2023-06-20 01:12:33.427806: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 68822 of 117590 2023-06-20 01:12:43.427861: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 82709 of 117590 2023-06-20 01:12:53.427973: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 96477 of 117590 2023-06-20 01:13:03.428104: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:97] Filling up shuffle buffer (this may take a while): 110461 of 117590 2023-06-20 01:13:08.490735: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:135] Shuffle buffer filled. ERROR:tensorflow:Model diverged with loss = NaN. Traceback (most recent call last): File "test_trasend.py", line 56, in trasend_estimator.train(training_input_function) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1173, in _train_model_default saving_listeners) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1451, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 583, in run run_metadata=run_metadata) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1059, in run run_metadata=run_metadata) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1150, in run raise six.reraise(original_exc_info) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/six.py", line 719, in reraise raise value File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1135, in run return self._sess.run(args, **kwargs) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1215, in run run_metadata=run_metadata)) File "/home/ting10030829/anaconda3/envs/tensorflow_1/lib/python3.6/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 635, in after_run raise NanLossDuringTrainingError tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

DavideBuffelli commented 1 year ago

Hi, unfortunately I am not able to replicate this error. Could I ask you which version of tensor flow you are using? and have you tried looking at the data (in particular if there are any NaNs in there)?