cheesama / DIET-pytorch

Dual Intent Entity Classifier Pytorch version
MIT License
17 stars 5 forks source link

'The loss returned in `training_step` is nan or inf.' #1

Closed thuan1412 closed 4 years ago

thuan1412 commented 4 years ago

Error: Epoch 1: 65%|███████████████████████████████████████████████████████████▏ | 13/20 [01:51<00:59, 8.55s/it, loss=0.168, v_num=6]Traceback (most recent call last): File "trainer.py", line 45, in train('nlu.md') File "trainer.py", line 43, in train trainer.fit(model) File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 732, in fit self.run_pretrain_routine(model) File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 864, in run_pretrain_routine self.train() File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 363, in train self.run_training_epoch() File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 435, in run_training_epoch _outputs = self.run_training_batch(batch, batch_idx) File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 607, in run_training_batch self.detect_nan_tensors(loss) File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/pytorch_lightning/trainer/training_tricks.py", line 63, in detect_nan_tensors 'The loss returned in training_step is nan or inf.' ValueError: The loss returned in training_step is nan or inf. Exception ignored in: <object repr() failed> Traceback (most recent call last): File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/tqdm/std.py", line 1084, in del File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/tqdm/std.py", line 1291, in close File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/tqdm/std.py", line 1469, in display File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/tqdm/std.py", line 1087, in repr File "/home/thuandh/Code/DIET-pytorch/DIET/env/lib/python3.6/site-packages/tqdm/std.py", line 1431, in format_dict TypeError: 'NoneType' object is not iterable

This error occur when I train on this file. Can you give me a sample training file?

cheesama commented 4 years ago

It follow rasa training data(markdown) format.

You can check this from below link https://rasa.com/docs/rasa/nlu/training-data-format/

Recently I modified trainer class and the model can take whole bunch of data as a hparams(for resuming training or inferencing)