Zero division error - Githubissues

kushalj001 commented 3 years ago

I am using nlu data from here https://github.com/RasaHQ/rasa-masterclass/blob/master/episode8/data/nlu.md. When I run the trainer, I get the following error:

 File "scratch.py", line 13, in <module>
    num_encoder_layers=2,
  File "/mnt/d/DIET-pytorch/DIET/trainer.py", line 104, in train
    trainer.fit(model)
  File "/mnt/d/DIET-pytorch/dietenv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in fit
    self.run_pretrain_routine(model)
  File "/mnt/d/DIET-pytorch/dietenv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1093, in run_pretrain_routine
    self.train()
  File "/mnt/d/DIET-pytorch/dietenv/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 402, in train
    self.run_training_teardown()
  File "/mnt/d/DIET-pytorch/dietenv/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 713, in run_training_teardown
    self.on_train_end()
  File "/mnt/d/DIET-pytorch/dietenv/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 72, in on_train_end
    callback.on_train_end(self, self.get_model())
  File "/mnt/d/DIET-pytorch/DIET/eval.py", line 41, in on_train_end
    show_entity_report(dataset, pl_module, tokenizer, file_name=entity_report_nm, output_dir=self.output_dir, cuda=self.cuda)
  File "/mnt/d/DIET-pytorch/DIET/metrics/evaluate.py", line 94, in show_entity_report
    report = show_entity_metrics(pred=preds, label=targets, file_name=file_name, output_dir=output_dir)
  File "/mnt/d/DIET-pytorch/DIET/metrics/metrics.py", line 194, in show_entity_metrics
    output = entity_metric.generate_report()
  File "/mnt/d/DIET-pytorch/DIET/metrics/metrics.py", line 331, in generate_report
    self.cal_micro_avg()
  File "/mnt/d/DIET-pytorch/DIET/metrics/metrics.py", line 277, in cal_micro_avg
    f1 = 2*(precision * recall / (precision + recall))
ZeroDivisionError: float division by zero

Can the code be run for english language too without any changes or it works for korean language only by default? It would be great if you could tell some way to get around this. Thank you.

karmalk commented 3 years ago

Did you solve this problem? I also encountered this bug. I hope you can help me.

kushalj001 commented 3 years ago

No, I did not.

karmalk commented 3 years ago

I think I can run this code successfully, but there are still some warning messages, and the results of the verification seem to be wrong. In short, the zero_division error message seems to be gone, and after training I have performed reasoning, it is OK Get the result of inference normally

Epoch 5:  78%|███████▊  | 7/9 [00:17<00:04,  2.45s/it, loss=1.164, v_num=0, val/loss=2.41, val/intent_acc=0.585, val/entity_acc=0.139, val/intent_f1=0.073]
Validating: 0it [00:00, ?it/s]entity_pred:  tensor([[0, 4, 4,  ..., 0, 0, 0],
        [0, 4, 4,  ..., 0, 0, 0],
        [0, 4, 4,  ..., 0, 0, 0],
        ...,
        [0, 4, 4,  ..., 0, 0, 0],
        [0, 4, 4,  ..., 0, 0, 0],
        [0, 4, 0,  ..., 0, 0, 0]])
entity_idx:  tensor([[0, 3, 4,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 2, 2,  ..., 0, 0, 0]])

Epoch 5:  89%|████████▉ | 8/9 [00:17<00:02,  2.19s/it, loss=1.164, v_num=0, val/loss=2.41, val/intent_acc=0.585, val/entity_acc=0.139, val/intent_f1=0.073]entity_pred:  tensor([[0, 4, 4,  ..., 0, 0, 0],
        [0, 4, 4,  ..., 0, 0, 0],
        [0, 4, 4,  ..., 0, 0, 0],
        ...,
        [0, 4, 4,  ..., 0, 0, 0],
        [0, 4, 4,  ..., 0, 0, 0],
        [0, 4, 4,  ..., 0, 0, 0]])
entity_idx:  tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 3, 4,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]])

Epoch 5: 100%|██████████| 9/9 [00:17<00:00,  1.97s/it, loss=1.164, v_num=0, val/loss=2.35, val/intent_acc=0.585, val/entity_acc=0.149, val/intent_f1=0.0732]
                                                         evaluate valid data

load intent dataset:   0%|          | 0/2 [00:00<?, ?it/s]
load intent dataset:  50%|█████     | 1/2 [00:00<00:00,  2.38it/s]
load intent dataset: 100%|██████████| 2/2 [00:00<00:00,  2.83it/s]
/Users/loukun/opt/anaconda3/envs/torch_test/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/loukun/opt/anaconda3/envs/torch_test/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/loukun/opt/anaconda3/envs/torch_test/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/loukun/opt/anaconda3/envs/torch_test/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/loukun/opt/anaconda3/envs/torch_test/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/loukun/opt/anaconda3/envs/torch_test/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

load entity dataset:   0%|          | 0/2 [00:00<?, ?it/s]
load entity dataset:  50%|█████     | 1/2 [00:00<00:00,  2.14it/s]
load entity dataset: 100%|██████████| 2/2 [00:00<00:00,  2.77it/s]
/Volumes/Work/NLP_Learning/RASA_DialogueSystem/DIET_Model/DIET-pytorch/DIET/metrics/metrics.py:304: RuntimeWarning: invalid value encountered in double_scalars
  f1 = 2*(precision * recall / (precision + recall))
/Volumes/Work/NLP_Learning/RASA_DialogueSystem/DIET_Model/DIET-pytorch/DIET/metrics/metrics.py:330: RuntimeWarning: invalid value encountered in double_scalars
  f1 = 2*(precision * recall / (precision + recall))
dict_items([('facility_type', {'TP': 0, 'TN': 0, 'FP': 1, 'FN': 14}), ('location', {'TP': 0, 'TN': 0, 'FP': 14, 'FN': 23})])
sumTP: 0
sum_FP: 15
Zero Division Error occurred
Epoch 5: 100%|██████████| 9/9 [00:19<00:00,  2.14s/it, loss=1.164, v_num=0, val/loss=2.35, val/intent_acc=0.585, val/entity_acc=0.149, val/intent_f1=0.0732]

The following results are the results of my inference

(torch_test) loukun@loukundeMacBook-Pro DIET-pytorch % python test_DIET_inferencer.py ./lightning_logs/version_0/checkpoints/epoch=5.ckpt "hello man"
Organizing Intent & Entity dictionary in NLU markdown file ...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 289/289 [00:00<00:00, 210333.83it/s]
Extracting Intent & Entity in NLU markdown files...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 289/289 [00:00<00:00, 29254.35it/s]
Intents: {'affirm': 0, 'deny': 1, 'goodbye': 2, 'greet': 3, 'inform': 4, 'mood_great': 5, 'mood_unhappy': 6, 'out_of_scope': 7, 'search_provider': 8, 'thanks': 9}
Entities: {'O': 0, 'facility_type_B': 1, 'facility_type_I': 2, 'location_B': 3, 'location_I': 4}

 infer result: 
{   'entities': [],
    'intent': {'confidence': 0.3663807809352875, 'name': 'inform'},
    'intent_ranking': [   {'confidence': 0.3663807809352875, 'name': 'inform'},
                          {   'confidence': 0.2510490417480469,
                              'name': 'search_provider'},
                          {   'confidence': 0.1037619486451149,
                              'name': 'out_of_scope'},
                          {'confidence': 0.04947172477841377, 'name': 'greet'},
                          {   'confidence': 0.043566226959228516,
                              'name': 'goodbye'}],
    'text': 'hello man'}

The data used comes from here

kushalj001 commented 3 years ago

Can you tell me what changes you made to the code or share the data that you used?

karmalk commented 3 years ago

I just ran an exception handling at the place where the program reported the error. This zero_division error is actually because 0 appeared in the denominator when doing division, but I still don’t know why the calculated value here is 0, so just add an exception. Skip here to perform the next operation. The data used is the same as the data you use, no other changes have been made, and the code has only been modified here

        print("sumTP:",sum_TP,"sum_FP:",sum_FP,"sumFN",sum_FN)
        try:
            precision = sum_TP / (sum_TP + sum_FP)
            recall = sum_TP / (sum_TP + sum_FN)
            f1 = 2*(precision * recall / (precision + recall))
            print("precision:", precision)
            print("recall:", recall)
            print("f1:", f1)
        except ZeroDivisionError:
            print("Zero Division Error occurred")

kushalj001 commented 3 years ago

Ah, alright. Thank you.

cheesama commented 3 years ago

what about just modify like this?

if sum_TP + sum_FP ==0: precision = 0 else: precision = sum_TP / (sum_TP + sum_FP) ...

karmalk commented 3 years ago

This is also possible, but there is a calculation of f1-score later, and z needs to be taken into account, because when precision and recall are both found to be 0, the denominator of the f1-score calculation formula is also 0, and there will still be zero_division is wrong, so f1 should also be taken into consideration

precision = sum_TP / (sum_TP + sum_FP)
recall = sum_TP / (sum_TP + sum_FN)
f1 = 2*(precision * recall / (precision + recall))

This is the log information I printed

dict_items([('facility_type', {'TP': 0, 'TN': 0, 'FP': 4, 'FN': 8}), ('location', {'TP': 0, 'TN': 0, 'FP': 7, 'FN': 17})])
sumTP: 0 sum_FP: 11 sumFN 25
Zero Division Error occurred
precision: 0.0
recall: 0.0

cheesama commented 3 years ago

Oh you're right, I will refactor metric function using pycm. Current logic is too complicate I think. Thank you for you feedback

cheesama / DIET-pytorch

Zero division error #11