Using YOLO as an OCR - choice of the number of classes

YKritet commented 5 years ago

Hi,

I am trying to build an OCR using Yolo_tiny with about 80 classes, but I just can't make the loss drop any further (stuck at 5). I have tried many approaches suggested across different issues but they all were met with failure. So I started wondering whether the number of classes has something to do with this. My question is :

Should I lessen the number of classes (i.e detect characters and then do classification afterwards) or I just have to acquire more data (I have about 100000 sample or sentences)?

Thanks in advance,

AlexeyAB commented 5 years ago

@YKritet Hi, What mAP did you get?

YKritet commented 5 years ago

Hi,

Thanks for the quick response, I used this command to calculate the -mAP:

!./darknet detector map "obj.data" "obj.cfg" ../model_tiny/obj_last.weights

and I've gotten the following

layer     filters    size              input                output
   0 conv     16  3 x 3 / 1   512 x 128 x   3   ->   512 x 128 x  16 0.057 BF
   1 max          2 x 2 / 2   512 x 128 x  16   ->   256 x  64 x  16 0.001 BF
   2 conv     32  3 x 3 / 1   256 x  64 x  16   ->   256 x  64 x  32 0.151 BF
   3 max          2 x 2 / 2   256 x  64 x  32   ->   128 x  32 x  32 0.001 BF
   4 conv     64  3 x 3 / 1   128 x  32 x  32   ->   128 x  32 x  64 0.151 BF
   5 max          2 x 2 / 2   128 x  32 x  64   ->    64 x  16 x  64 0.000 BF
   6 conv    128  3 x 3 / 1    64 x  16 x  64   ->    64 x  16 x 128 0.151 BF
   7 max          2 x 2 / 2    64 x  16 x 128   ->    32 x   8 x 128 0.000 BF
   8 conv    256  3 x 3 / 1    32 x   8 x 128   ->    32 x   8 x 256 0.151 BF
   9 max          2 x 2 / 2    32 x   8 x 256   ->    16 x   4 x 256 0.000 BF
  10 conv    512  3 x 3 / 1    16 x   4 x 256   ->    16 x   4 x 512 0.151 BF
  11 max          2 x 2 / 1    16 x   4 x 512   ->    16 x   4 x 512 0.000 BF
  12 conv   1024  3 x 3 / 1    16 x   4 x 512   ->    16 x   4 x1024 0.604 BF
  13 conv    256  1 x 1 / 1    16 x   4 x1024   ->    16 x   4 x 256 0.034 BF
  14 conv    512  3 x 3 / 1    16 x   4 x 256   ->    16 x   4 x 512 0.151 BF
  15 conv    228  1 x 1 / 1    16 x   4 x 512   ->    16 x   4 x 228 0.015 BF
  16 yolo
  17 route  13
  18 conv    128  1 x 1 / 1    16 x   4 x 256   ->    16 x   4 x 128 0.004 BF
  19 upsample            2x    16 x   4 x 128   ->    32 x   8 x 128
  20 route  19 8
  21 conv    256  3 x 3 / 1    32 x   8 x 384   ->    32 x   8 x 256 0.453 BF
  22 conv    228  1 x 1 / 1    32 x   8 x 256   ->    32 x   8 x 228 0.030 BF
  23 yolo
Total BFLOPS 2.104 
 Allocate additional workspace_size = 52.43 MB 
Loading weights from ../model_tiny/obj_last.weights...
 seen 64 
Done!

 calculation mAP (mean average precision)...
19908
 detections_count = 1631306, unique_truth_count = 491096  
class_id = 0, name = 0, ap = 88.01%      (TP = 10763, FP = 2604) 
class_id = 1, name = 1, ap = 84.60%      (TP = 12474, FP = 2782) 
class_id = 2, name = 2, ap = 88.52%      (TP = 10882, FP = 2576) 
class_id = 3, name = 3, ap = 89.76%      (TP = 8964, FP = 1388) 
class_id = 4, name = 4, ap = 90.20%      (TP = 9037, FP = 1351) 
class_id = 5, name = 5, ap = 89.75%      (TP = 9031, FP = 1774) 
class_id = 6, name = 6, ap = 90.46%      (TP = 9228, FP = 2163) 
class_id = 7, name = 7, ap = 90.80%      (TP = 9164, FP = 1300) 
class_id = 8, name = 8, ap = 89.13%      (TP = 8852, FP = 1750) 
class_id = 9, name = 9, ap = 90.08%      (TP = 9187, FP = 2374) 
class_id = 10, name = A, ap = 96.18%     (TP = 8857, FP = 799) 
class_id = 11, name = B, ap = 92.90%     (TP = 3509, FP = 645) 
class_id = 12, name = C, ap = 95.75%     (TP = 11683, FP = 1048) 
class_id = 13, name = D, ap = 95.55%     (TP = 6862, FP = 925) 
class_id = 14, name = E, ap = 94.38%     (TP = 14613, FP = 1830) 
class_id = 15, name = F, ap = 94.87%     (TP = 11230, FP = 1125) 
class_id = 16, name = G, ap = 94.19%     (TP = 3441, FP = 591) 
class_id = 17, name = H, ap = 94.32%     (TP = 3409, FP = 547) 
class_id = 18, name = I, ap = 68.47%     (TP = 3648, FP = 1908) 
class_id = 19, name = J, ap = 84.91%     (TP = 2919, FP = 632) 
class_id = 20, name = K, ap = 92.63%     (TP = 3375, FP = 617) 
class_id = 21, name = L, ap = 89.78%     (TP = 3505, FP = 632) 
class_id = 22, name = M, ap = 95.06%     (TP = 5334, FP = 756) 
class_id = 23, name = N, ap = 93.93%     (TP = 7937, FP = 1276) 
class_id = 24, name = O, ap = 94.15%     (TP = 7056, FP = 1498) 
class_id = 25, name = P, ap = 92.93%     (TP = 4115, FP = 657) 
class_id = 26, name = Q, ap = 93.55%     (TP = 3417, FP = 430) 
class_id = 27, name = R, ap = 95.46%     (TP = 11351, FP = 1464) 
class_id = 28, name = S, ap = 90.91%     (TP = 5468, FP = 1377) 
class_id = 29, name = T, ap = 94.67%     (TP = 9542, FP = 991) 
class_id = 30, name = U, ap = 95.78%     (TP = 9770, FP = 1040) 
class_id = 31, name = V, ap = 92.41%     (TP = 4122, FP = 628) 
class_id = 32, name = W, ap = 94.03%     (TP = 3389, FP = 648) 
class_id = 33, name = X, ap = 93.98%     (TP = 3473, FP = 368) 
class_id = 34, name = Y, ap = 94.07%     (TP = 3495, FP = 316) 
class_id = 35, name = Z, ap = 92.25%     (TP = 3462, FP = 534) 
class_id = 36, name = a, ap = 91.93%     (TP = 3156, FP = 388) 
class_id = 37, name = b, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 38, name = c, ap = 88.78%     (TP = 5970, FP = 1101) 
class_id = 39, name = d, ap = 90.11%     (TP = 2162, FP = 478) 
class_id = 40, name = e, ap = 89.79%     (TP = 10971, FP = 2307) 
class_id = 41, name = f, ap = 85.57%     (TP = 2611, FP = 765) 
class_id = 42, name = g, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 43, name = h, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 44, name = i, ap = 67.42%     (TP = 1079, FP = 687) 
class_id = 45, name = j, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 46, name = k, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 47, name = l, ap = 45.92%     (TP = 647, FP = 716) 
class_id = 48, name = m, ap = 93.87%     (TP = 3247, FP = 684) 
class_id = 49, name = n, ap = 90.34%     (TP = 16368, FP = 5132) 
class_id = 50, name = o, ap = 92.52%     (TP = 7351, FP = 1546) 
class_id = 51, name = p, ap = 92.91%     (TP = 551, FP = 59) 
class_id = 52, name = q, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 53, name = r, ap = 83.62%     (TP = 5122, FP = 1582) 
class_id = 54, name = s, ap = 83.49%     (TP = 1927, FP = 509) 
class_id = 55, name = t, ap = 81.32%     (TP = 3719, FP = 1227) 
class_id = 56, name = u, ap = 86.88%     (TP = 5434, FP = 1370) 
class_id = 57, name = v, ap = 83.48%     (TP = 141, FP = 37) 
class_id = 58, name = w, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 59, name = x, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 60, name = y, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 61, name = z, ap = 0.00%      (TP = 0, FP = 0) 
class_id = 62, name = /, ap = 64.21%     (TP = 4641, FP = 2553) 
class_id = 63, name = ., ap = 48.08%     (TP = 922, FP = 732) 
class_id = 64, name = °, ap = 80.83%     (TP = 9572, FP = 4363) 
class_id = 65, name = |, ap = 42.89%     (TP = 170, FP = 126) 
class_id = 66, name = -, ap = 61.65%     (TP = 219, FP = 114) 
class_id = 67, name = :, ap = 63.15%     (TP = 4905, FP = 2532) 
class_id = 68, name = wo, ap = 93.09%        (TP = 62594, FP = 14826) 
class_id = 69, name = nf, ap = 92.03%        (TP = 19654, FP = 8292) 
class_id = 70, name = da, ap = 98.77%        (TP = 2925, FP = 278) 

 for thresh = 0.25, precision = 0.82, recall = 0.88, F1-score = 0.85 
 for thresh = 0.25, TP = 432622, FP = 95748, FN = 58474, average IoU = 64.58 % 

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall 
 mean average precision (mAP@0.50) = 0.745226, or 74.52 % 
Total Detection Time: 69.000000 Seconds

Set -points flag:
 `-points 101` for MS COCO 
 `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data) 
 `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

HagegeR commented 5 years ago

maybe a confusion matrix could help in this case since some class are never found (0%)... but I don't know how to plot one

AlexeyAB commented 5 years ago

The mAP is good. Don't look at the loss.

Just check that you have enought Training and Validation examples with objects: b, j, k, q, ... and other with AP=0%

YKritet commented 5 years ago

@HagegeR We need TN for a confusion matrix, which we haven't access to

@AlexeyAB Ok thanks a lot, broadly the problem of detection that I am facing right now comes from noisy images (I did try to simulate such images in my training data)

What about the cfg file, is there a specific configuration for such a situation ?

AlexeyAB commented 5 years ago

Just may be increase width=608 height=608 in cfg-file if you want to detect small objects

YKritet commented 5 years ago

Hi,

I decided to keep the ratio aspect using the -letter_box flag and I wanted to ask about the differences in performance (not the mAP necessarily, more like the specific characteristics) between 4 cfg files :

       - yolov3_5l.cfg 
       - yolov3-spp.cfg
       - yolov3.cfg
       - darknet53.cfg

Thank you very much

AlexeyAB / darknet

Using YOLO as an OCR - choice of the number of classes #3943