AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

Negative and Positive Samples in Same Sample #1400

Open silvernine209 opened 6 years ago

silvernine209 commented 6 years ago

If I have "pictureA.jpg" with dog(class 0) and person(class 1), I will have "pictureA.txt" with something like :

0 0.716797 0.395833 0.216406 0.147222 1 0.687109 0.379167 0.255469 0.158333

Now, can I include cat (class 2) without any info to let the training know that it is a negative sample(class)? Something like this :

0 0.716797 0.395833 0.216406 0.147222 1 0.687109 0.379167 0.255469 0.158333 2

I'm trying to improve training by following recommendation below, but I'm not sure how I can best execute it.

desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects

AlexeyAB commented 6 years ago

Now, can I include cat (class 2) without any info to let the training know that it is a negative sample(class)? Something like this :

0 0.716797 0.395833 0.216406 0.147222 1 0.687109 0.379167 0.255469 0.158333 2

You shouldn't do it.

Negative samples - are images without any objects with empty label txt-file.

silvernine209 commented 6 years ago

@AlexeyAB Thank you for prompt clarification!

I was looking into ways to improve and speed training up since I have 1.6 million images and 500 classes, which would take about 500*2,000 = 1,000,000 iterations for decent performance.

I was initially doing transfer learning, but estimated about a month of training on my GTX1070, so I switched to fine tuning. I'm currently at 30,000 iterations and below is general progress:

I obtained new anchors as you recommended, and annotation boxes, which were drawn by professional annotators, are correct since they were given by Google for a Kaggle competition.

Given the above, is the progress looking pretty normal to you given the size of dataset and classes? All parameters in .cfg is default for fine tuning process. Do you recommend any tweaks (maybe to learning rate) to speed up the process? I will be satisfied even with mAP around 30%.

Thank you for your time.

AlexeyAB commented 6 years ago

I was looking into ways to improve and speed training ...

I switched to fine tuning.

From 20,000 to 30,000 iterations, both IoU is ~20% and mAP increased from ~2% to ~3%.

silvernine209 commented 6 years ago
  1. For my case, fine-tuning seems to be doing great so far

image

  1. I added stopbackward=1 at L548 above ####### as recomended
  2. I'm using yolov3.cfg
  3. I will double check using yolo_mark, but I think all annotations were converted correctly. All of annotation bboxes were given x_min, y_min, x_max, and y_max with zero coordinate in left bottom of the image. I obtained x_center, y_center, width, and height. x_center = x_min + width/2,y_center = y_min+height/2,width = abs(x_max - x_min), height = abs(y_max - y_min).
  4. I'm currently at 23,100 iteration of fine-tuning and at 23.22 IoU and 5.34% mAP
  5. New anchors = 19,26, 46,82, 66,173, 137,94, 94,301, 180,210, 353,184, 212,355, 381,374
  6. random=1 was used. Resolution was not changed yet for detection, and will do so at the end of training. Anchors were recalculated. Wasn't able to verify if all objects are labeled since dataset is 1.6 million, but Google did a very good job annotating as much as possible. And nothing else has been done.

Here is .cfg file if you would like to take a look

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=32
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 100000
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

# Downsample

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear
stopbackward=1
######################

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=1515
activation=linear

[yolo]
mask = 6,7,8
anchors = 19,26,  46,82,  66,173,  137,94,  94,301,  180,210,  353,184,  212,355,  381,374
classes=500
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 61

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=1515
activation=linear

[yolo]
mask = 3,4,5
anchors = 19,26,  46,82,  66,173,  137,94,  94,301,  180,210,  353,184,  212,355,  381,374
classes=500
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 36

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=1515
activation=linear

[yolo]
mask = 0,1,2
anchors = 19,26,  46,82,  66,173,  137,94,  94,301,  180,210,  353,184,  212,355,  381,374
classes=500
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
silvernine209 commented 6 years ago

@AlexeyAB I think I found what I was doing wrong. What a dummy..............!! so for y_center, I did y_min + height/2 on a left&bottom zero coordinate system. To correct all wrong doings, I just need to do 1 - y_min + height/2. I wasted daysssssss! Thank you so so much Alexey for helping me to realize this. I'm sure training will go alot better after I fix y_center of all files.

AlexeyAB commented 6 years ago

I will double check using yolo_mark, but I think all annotations were converted correctly. All of annotation bboxes were given x_min, y_min, x_max, and y_max with zero coordinate in left bottom of the image. I obtained x_center, y_center, width, and height. x_center = x_min + width/2,y_center = y_min+height/2,width = abs(x_max - x_min), height = abs(y_max - y_min).

@AlexeyAB I think I found what I was doing wrong. What a dummy..............!! so for y_center, I did y_min + height/2 on a left&bottom zero coordinate system. To correct all wrong doings, I just need to do 1 - y_min + height/2. I wasted daysssssss! Thank you so so much Alexey for helping me to realize this. I'm sure training will go alot better after I fix y_center of all files.

Do you mean that in this Google dataset the coords [x_min,y_min] is the bottom-left corner instead of top-left corner?

Did you check y_center = y_min+height/2 or y_center = 1 - y_min+height/2 by using Yolo_mark?

silvernine209 commented 6 years ago

Yes, in Google dataset, coords [x_min,y_min] is the bottom-left corner instead of top-left corner.

I compiled Yolo_mark but I couldn't get it to work. However, using airplane and bird examples in Yolo_mark repo, I verified that all of my y_center was wrong since it used y_center = y_min+height/2, and indeed y_center = 1 - y_min+height/2 is correct.

I'm updating all text files of train images and validation images, and will try to get some results tonight. I will report on the progress, but I expect day and night difference compared to:

image

lp-094 commented 5 years ago

whether the txt include nothing? thank you