chakki-works / seqeval

A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)
MIT License
1.09k stars 129 forks source link

ZeroDivisionError when y_true has only 'O' tokens #60

Closed eloukas closed 4 years ago

eloukas commented 4 years ago

As the title says, when your y_true contains only 'O' tokens, the classification_report method will return the following error:

ZeroDivisionError: Weights sum to zero, can't be normalized

This can be confusing when testing on a small, local dataset where the chances of having 'O' tokens is really high, thus, confusing the user.

Maybe there should be a try catch returning an appropriate message for that case.

How to reproduce the behaviour

from seqeval.metrics import classification_report
from seqeval.scheme import IOB2

y_pred = [['B-LOC', 'I-LOC', 'I-PER', 'B-PER']]
y_true = [['O', 'O', 'O', 'O']]

print(classification_report(y_true=y_true, y_pred=y_pred, output_dict=False, scheme=IOB2))
/home/kaslou/anaconda3/envs/my_env/lib/python3.8/site-packages/numpy/lib/function_base.py:380: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis)
/home/kaslou/anaconda3/envs/my_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/home/kaslou/.config/JetBrains/PyCharm2020.2/scratches/check-eval.py", line 9, in <module>
    print(classification_report(y_true=y_true, y_pred=y_pred, output_dict=False, scheme=IOB2))
  File "/home/kaslou/anaconda3/envs/my_env/lib/python3.8/site-packages/seqeval/metrics/sequence_labeling.py", line 425, in classification_report
    p = np.average(ps, weights=s)
  File "<__array_function__ internals>", line 5, in average
  File "/home/kaslou/anaconda3/envs/my_env/lib/python3.8/site-packages/numpy/lib/function_base.py", line 409, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

Your Environment

Hironsan commented 4 years ago

Please specify mode='strict':

>>> from seqeval.metrics import classification_report
>>> from seqeval.scheme import IOB2
>>> y_pred = [['B-LOC', 'I-LOC', 'I-PER', 'B-PER']]
>>> y_true = [['O', 'O', 'O', 'O']]
>>> print(classification_report(y_true=y_true, y_pred=y_pred, scheme=IOB2, mode='strict'))
/Users/hironsan/PycharmProjects/seqeval/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/hironsan/PycharmProjects/seqeval/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 due to no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
              precision    recall  f1-score   support
         LOC       0.00      0.00      0.00         0
         PER       0.00      0.00      0.00         0
   micro avg       0.00      0.00      0.00         0
   macro avg       0.00      0.00      0.00         0
weighted avg       0.00      0.00      0.00         0