AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.63k stars 7.95k forks source link

fluctuation in loss for training yolov3 on 30 classes #3039

Open cyrineee opened 5 years ago

cyrineee commented 5 years ago

I train yolov 3 for 30 classes (3000 per class ) and i see this result for the training is it normal ? chart

cyrineee commented 5 years ago

my config file

[net]

Testing

batch=64 subdivisions=64

Training

batch=64

subdivisions=16

width=608 height=608 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 60000 policy=steps steps=48000,54000 scales=.1,.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

Downsample

[convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=1024 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

######################

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=105 activation=linear

[yolo] mask = 6,7,8 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=30 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 61

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=105 activation=linear

[yolo] mask = 3,4,5 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=30 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 36

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=105 activation=linear

[yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=30 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1 max=200

AlexeyAB commented 5 years ago

I train yolov 3 for 30 classes (3000 per class ) and i see this result for the training is it normal ?

It's normal.

Train while Loss decreases and mAP increases.

cyrineee commented 5 years ago

the problem is that the mAP is decreasing chart2 Perhaps because there are too many object in different scales ?

cyrineee commented 5 years ago

or because i have only 8727 images in the test set and the in training set i have 78544 images ? in addation , i didn't add negative sample as u mentionned (empty files )

AlexeyAB commented 5 years ago

There are almost always accuracy fluctuations during training. Just wait until Loss & mAP increase stop.

Also if Training and Test/Validation images are very different, then there can be overfitting: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

cyrineee commented 5 years ago

Okay thanks for ur response . and for the negative samples how can i proceed if i have 30 classes ?

cyrineee commented 5 years ago

i have a problem with mAP it's too low and the loss is decreasing . what can be the problem ? training set 66% and validation set is 33% . i have an average of 2000 samples for every class .batch=64 and subdivision=64 .

i want to know what's the ignore_thresh and the truth_thresh ? it has a relation with mAP ? Thanks in advance .

chart

AlexeyAB commented 5 years ago

training set 66% and validation set is 33% .

Did you divide it randomly? What cfg-file do you use?

cyrineee commented 5 years ago

cfg-file : yolo_obj.cfg.txt

/////////////////
to split i used this script : 
import glob, os
import random
# Current directory
current_dir = os.path.dirname(os.path.abspath(__file__))
print(current_dir)
current_dir = '/OIDv4_ToolKit/train/'
# Percentage of images to be used for the test set
percentage_test = 30;
# Create and/or truncate train.txt and test.txt
file_train = open('train.txt', 'w')  
file_test = open('test.txt', 'w')
# Populate train.txt and test.txt
counter = 1  
index_test = round(100 / percentage_test) 
for pathAndFilename in glob.iglob(os.path.join(current_dir, "*.jpg")):  
    title, ext = os.path.splitext(os.path.basename(pathAndFilename))
    if counter == index_test:
        counter = 1
        file_test.write(current_dir + "/" + title + '.jpg' + "\n")
    else:
        file_train.write(current_dir + "/" + title + '.jpg' + "\n")
        counter = counter + 1
AlexeyAB commented 5 years ago

Try to split it randomly

cyrineee commented 5 years ago

@AlexeyAB i want to know what's the ignore_thresh and the truth_thresh ? it has a relation with mAP ?

AlexeyAB commented 5 years ago

@cyrineee It's used there: https://github.com/AlexeyAB/darknet/blob/cce34712f6928495f1fbc5d69332162fc23491b9/src/yolo_layer.c#L255-L268

ignore_thresh=0.5 means that training will be optimized for mAP@0.5 (for IoU_threshold=0.5)

cyrineee commented 5 years ago

chart Now it's a fixed .I changed the subdivision to 16 i got the same Loss but the mAP is increasing now .in addition i splitted the data randomly and i increased the number of the validation set with making sure that all the classes exists . I want to ask a question . Can i use the .py (and not c) script to test the model i trained ?

cyrineee commented 5 years ago

[ for training with a large number of objects in each image, add the parameter max=200 or higher value in the last [yolo]-layer or [region]-layer in your cfg-file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375(widthheight) where are width and height are parameters from [net] section in cfg-file)] if i add the max in the config file , can i have a better detection ?

AlexeyAB commented 5 years ago

if i add the max in the config file , can i have a better detection ?

In some cases yes, just try.

cyrineee commented 5 years ago

@AlexeyAB i used max=200 and i continued the training from 7k iterations but mAP is decreasing and also the loss is decreasing (i think th model is overfitting) . i got 67% of mAP , is this a good precision ? i want to enhance mAP to 80% , what can u advice me ? The problem for some classes yolo is doing very well but for some other classes no , should i add some images only for the classes with a low detection and continue from the weights of 7k iteration ? I really want a good advice . Thanks in advance .

cyrineee commented 5 years ago

I'm using some classes of openimages database v4 , if i start with the weights of openimages dataset instead of darknet53.conv.74 , can i have better results ?

AlexeyAB commented 5 years ago

@cyrineee

should i add some images only for the classes with a low detection and continue from the weights of 7k iteration ?

Yes, you can.

You can try to do ./darknet partial cfg/yolov3-openimages.cfg yolov3-openimages.weights yolov3-openimages.conv.81 81

And then use yolov3-openimages.conv.81 pretrained weights file instead of darknet53.conv.74

cyrineee commented 5 years ago

The problem is that i can t use yolov3-openimages.cfg because i have out of memory for 608 resolution .can i keep it 416 with these new weights?


From: Alexey notifications@github.com Sent: Saturday, May 11, 2019 5:32:41 PM To: AlexeyAB/darknet Cc: Cyrine NABLI; Mention Subject: Re: [AlexeyAB/darknet] fluctuation in loss for training yolov3 on 30 classes (#3039)

@cyrineeehttps://github.com/cyrineee

should i add some images only for the classes with a low detection and continue from the weights of 7k iteration ?

Yes, you can.

You can try to do ./darknet partial cfg/yolov3-openimages.cfg yolov3-openimages.weights yolov3-openimages.conv.81 81

And then use yolov3-openimages.conv.81 pretrained weights file instead of darknet53.conv.74

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/AlexeyAB/darknet/issues/3039#issuecomment-491520863, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKDCODV7PH6M3QR7BSYEEJLPU3RJTANCNFSM4HILW5KA.

AlexeyAB commented 5 years ago

@cyrineee You can use yolov3-openimages.cfg with width=416 height=416

cyrineee commented 5 years ago

@AlexeyAB what did u mean by this command : ./darknet partial cfg/yolov3-openimages.cfg yolov3-openimages.weights yolov3-openimages.conv.81 81 ( u mean tiny-yolo ? ) what's the difference between yolo and tiny-yolo ?