Sotera / webpageclassifier

Categorizes a website given URL into one of blog|wiki|news|forum|classified|shopping|undecided.
Apache License 2.0
8 stars 3 forks source link

Confusion Matrix labels are wrong #15

Closed ctwardy closed 7 years ago

ctwardy commented 7 years ago

When evaluate() calls metrics.confusion_matrix(), it doesn't pass a labels parameter, so the result is in sorted order. But when we print it out, we just zip it with the classes_ list, which can be in any order.

Fix by passing classes_.

ctwardy commented 7 years ago

Example: Precision report shows zeros for UNCERTAIN and "error", as expected, and reasonable results for "blog":

             precision    recall  f1-score   support
       blog       0.98      0.63      0.77        63
       wiki       0.96      0.72      0.82        65
       news       0.83      0.58      0.68        92
      forum       0.85      0.63      0.72        54
 classified       0.28      0.06      0.09        89
   shopping       0.36      0.62      0.46        56
  UNCERTAIN       0.00      0.00      0.00         0
      error       0.00      0.00      0.00         0
avg / total       0.69      0.51      0.57       419

But the confusion matrix shows zeros for "blog" and forum:

                blog:    0,   0,   0,   0,   0,   0,   0,   0,   0
                wiki:   12,  40,   1,   1,   1,   3,   0,   5,   0
                news:   48,   0,   5,   7,   0,   0,   0,  28,   1
               forum:    0,   0,   0,   0,   0,   0,   0,   0,   0
          classified:   12,   0,   1,   1,  34,   2,   0,   4,   0
            shopping:   15,   1,   2,   3,   4,  53,   0,  14,   0
           UNCERTAIN:   46,   0,   5,  16,   0,   6,   0,   7,   1
               error:   15,   0,   4,   2,   0,   0,   0,  35,   0