Sotera / webpageclassifier

Categorizes a website given URL into one of blog|wiki|news|forum|classified|shopping|undecided.
Apache License 2.0
8 stars 3 forks source link

Finish integrating ERROR category into scores #14

Closed ctwardy closed 7 years ago

ctwardy commented 7 years ago

The scikit scores seem to ignore ERROR (ie 'error') in the results. Notice the zeros in the 'error' row below, while the confusion matrix shows some action:

             precision    recall  f1-score   support

  UNCERTAIN       0.00      0.00      0.00         0
       blog       0.82      0.54      0.65        69
 classified       0.44      0.28      0.34        75
      error       0.00      0.00      0.00       240
      forum       0.77      0.80      0.78       337
       news       0.86      0.44      0.59       151
   shopping       0.52      0.70      0.60       155
       wiki       0.84      0.85      0.84        79

avg / total       0.56      0.52      0.53      1106

Confusion Matrix:
           UNCERTAIN:    0,   0,   0,   0,   0,   0,   0,   0
                blog:   15,  37,   4,   0,   3,   1,   9,   0
          classified:   23,   0,  21,   0,   0,   0,  31,   0
               error:  133,   7,   1,   0,  68,   8,  16,   7
               forum:   48,   1,   4,   0, 271,   1,  12,   0
                news:   28,   0,  10,   0,  10,  67,  30,   6
            shopping:   37,   0,   8,   0,   0,   1, 109,   0
                wiki:    7,   0,   0,   0,   2,   0,   3,  67

   µ Info: 0.39
   Total #: 1106
   #Errors:    0    (   0 Bleached)
#Predicted: 1106
  Accuracy: 0.52
ctwardy commented 7 years ago

Wait, no, fixing #15 didn't fix this. Problem is it's never forecasting 'error'.

             precision    recall  f1-score   support
      error       0.00      0.00      0.00       240

Confusion Matrix:
               error:    7,   7,   8,  68,   1,  16, 133,   0   <-- Note the zero in the last column.

The 'error' entries are showing up as UNCERTAIN or as 'forum'. So 'error' is never rising above threshold. Investigate.