AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.6k stars 7.95k forks source link

avg loss oscillates in multi class training #4601

Open barzan-hayati opened 4 years ago

barzan-hayati commented 4 years ago

Hi @AlexeyAB . I have a dataset for Licence Plate Recognition(Persian) and I want to do that via YOLO and object detection by your repository. Here I show a sample of my dataset(139x29 pixels):

TTCC-Car-569

In first step and for simplicity I want to detect digits(from 1 to 9) so I have a 9 class object detection. approximately I have a good dataset, since from all digits I have at least 2000 objects in all images.

First Problem: But I have a problem in training process.

CFG:

[net]
# Testing
batch=128
subdivisions=1
# Training
# batch=32
# subdivisions=1
width=224
height=224
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0001
burn_in=1000
max_batches = 1500000
policy=steps
steps   = 1100000
scales =.1
#steps  = 124100,250000,400000,450000
#scales = 10    ,.1    ,.1    ,.1
#250001,10    ,
[convolutional]
batch_normalize=0
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=0
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=0
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=0
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=0
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

###########

[convolutional]
batch_normalize=0
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=70
activation=linear

[region]
anchors =  1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52
bias_match=1
classes=9
coords=4
num=5
softmax=1
jitter=.2
rescore=0

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .5
random=1

Training started from 123700 and here I show map results in multiple steps(I don't show results before 280000):

chart_Config (4th copy)

chart

chart-3

As you see, avg loss oscillates and mAP doesn't improve anymore. Maximum value of mAP was ~73 also avg loss doesn't reach less than 0.3. So I need to test results for best state of mAP.

 calculation mAP (mean average precision)...
460
 detections_count = 4889, unique_truth_count = 3076  
class_id = 0, name = 1, ap = 65.61%      (TP = 377, FP = 228) 
class_id = 1, name = 2, ap = 73.55%      (TP = 325, FP = 164) 
class_id = 2, name = 3, ap = 68.86%      (TP = 267, FP = 140) 
class_id = 3, name = 4, ap = 68.89%      (TP = 240, FP = 147) 
class_id = 4, name = 5, ap = 70.33%      (TP = 241, FP = 133) 
class_id = 5, name = 6, ap = 71.49%      (TP = 186, FP = 109) 
class_id = 6, name = 7, ap = 71.89%      (TP = 176, FP = 97) 
class_id = 7, name = 8, ap = 81.50%      (TP = 220, FP = 90) 
class_id = 8, name = 9, ap = 71.84%      (TP = 162, FP = 95) 

 for conf_thresh = 0.25, precision = 0.65, recall = 0.71, F1-score = 0.68 
 for conf_thresh = 0.25, TP = 2194, FP = 1203, FN = 882, average IoU = 51.81 % 

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall 
 mean average precision (mAP@0.50) = 0.715529, or 71.55 % 
Total Detection Time: 1.000000 Seconds

Set -points flag:
 `-points 101` for MS COCO 
 `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data) 
 `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

Low value of average IoU = 51.81 % worried me and FP's are high.

Second Problem: Results of detection for input plate:

predictions

 Allocate additional workspace_size = 3.28 MB 
Loading weights from LPR_Deep_Learning_Test/OCR_Plates_Car_Truck_Yolo_Lite/Backup_Plates/Config_best.weights...
 seen 64 
Done! Loaded 13 layers from weights-file 
testdata/Plates-TTCC-TabaTozin-Modares/TTCC-Car-569.jpg: Predicted in 1.115000 milli-seconds.
4: 54%
7: 46%
3: 100%
4: 100%
9: 73%
9: 54%
2: 46%
2: 100%

As you could see, two samlpe of **9** has been detected, and if I set threshold=0.6 I'll miss **6,7,2**.

Main Quesition:

Is it possible to reach avg loss less than 0.1? Should I continue to training?

Thanks in advance.

AlexeyAB commented 4 years ago

Try to train this model https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny-prn.cfg with

[net]
width=160
height=32

Also why did you use batch_normalize=0 ?

barzan-hayati commented 4 years ago

Try to train this model https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny-prn.cfg with

[net]
width=160
height=32

Also why did you use batch_normalize=0 ?

Thanks a lot. I used YOLO-Lite a very simple and fast version of Yolo-TinyV2. In this network batch_normalize=0 and after training network, I sould convert it to TRT engine via Deep-Stream. Deep-Stream needs to has a square grid cells like 7x7, 9x9 and so on(In order to convert to TRT engines). I think with

width=160
height=32

It's impossible to have a square grid size. Am I right?

AlexeyAB commented 4 years ago

In this case try to use this model https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny.cfg with

[net]
width=160
height=160

Yolo is natively supported in DeepStream 4.0 (TRT engine via Deep-Stream): https://news.developer.nvidia.com/deepstream-sdk-4-now-available/

batch_normalize=1 will be fused into convolutional-layer by TRT so it will not decrease speed of inference.

barzan-hayati commented 4 years ago

In this case try to use this model https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny.cfg with

[net]
width=160
height=160

Yolo is natively supported in DeepStream 4.0 (TRT engine via Deep-Stream): https://news.developer.nvidia.com/deepstream-sdk-4-now-available/

batch_normalize=1 will be fused into convolutional-layer by TRT so it will not decrease speed of inference.

I'll train YoloV3-tiny network and report results here. So I keep this issue open. Thanks for your guidance.

mrhosseini commented 4 years ago

@barzan-hayati

As you are dealing with small objects, if you have not controlled it yet, one reason may be the problem mentioned in #4404.

barzan-hayati commented 4 years ago

@barzan-hayati

As you are dealing with small objects, if you have not controlled it yet, one reason may be the problem mentioned in #4404.

Yes. My objects are small but they are not too small respect to size of plate. I want to find characters in plates and size of a plate in a image varied from 80x20to 160x30. Thanks.

barzan-hayati commented 4 years ago

@barzan-hayati

As you are dealing with small objects, if you have not controlled it yet, one reason may be the problem mentioned in #4404.

I think those objects are near to the margin of image, also could be detected correctly. Thanks

barzan-hayati commented 4 years ago

YoloV3-Tiny has very good results, but it's too heavy for final solution, because it has 24 layers also has more than 10 times BFLOPS respect to Yolo-Lite. I need to use 3 networks and real time solution so I should use lighter version. If I increase input resolution(from 224 to 288 or more) I'll receive better results or not?