yolov3 - train COCO 2014 from scratch

ggenny commented 4 years ago

I would like to improve the identification in yolov3.weights by adding new images to some of the categories.

Before starting that i am trying to replicate the original result:

My configuration is ( coco get with script/get_coco... ):

yolov3.cfg:

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

... standard yolo with random=1

coco.data:

classes= 80
train  = /home/develop/coco/trainvalno5k.txt
valid  = /home/develop/coco/5k.txt
#valid  = coco_testdev
#valid = data/coco_val_5k.list
names = data/coco.names
backup = /home/develop/backup/
eval=coco

execution first 1000: ./darknet detector train cfg/coco.data cfg/yolov3.cfg weights/darknet53.conv.74 execution from 1000 with: ./darknet detector train cfg/coco.data cfg/yolov3.cfg backup/yolov3_1000.weights -gpus 0,1

after 40k iteration i see that:

I think I'm doing something wrong

AlexeyAB commented 4 years ago

Everythink is ok.

ggenny commented 4 years ago

hi, thank you for your time.

How many iterations I need? I expected to reach ~55mAP at 160000 (2000 * 80 classes)

AlexeyAB commented 4 years ago

From your cfg-file, you need to train 500 000 iterations, to reach 55 mAP@0.5 by using yolov3.cfg width=416 height=416 May be it will be earlier.

max_batches = 500200

ggenny commented 4 years ago

I don't understand but it doesn't seem to converge. It seems strange.

AlexeyAB commented 4 years ago

@ggenny Since you use 2 x GPUs, then try to set learning_rate=0.0005 and continue training https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu

AlexeyAB commented 4 years ago

@ggenny Also try to download the latest version of Darknet.

@WongKinYiu There was a bug in data augmantation for Detector training, which forced blur, from 26 Oct to 4 Dec: https://github.com/AlexeyAB/darknet/blame/5d0352f961f4dc3db8ccad0570481c69305c0143/src/data.c#L884

WongKinYiu commented 4 years ago

@AlexeyAB Oh... maybe I have to retrain many models...

AlexeyAB commented 4 years ago

@WongKinYiu Sorry. Bug was only for Detector, not for Classifier.

WongKinYiu commented 4 years ago

@AlexeyAB Fine, I have to talk to my partners.

And the good news is... I have to stop training the detectors, so I can check the anti-aliasing and mosaic first.

ggenny commented 4 years ago

@AlexeyAB learning_rate=0.0005 does not seem to improve.

I get the latest version and start from scratch? or I resume from the current point (learning_rate = 0.0005)?

AlexeyAB commented 4 years ago

I get the latest version and start from scratch? or I resume from the current point (learning_rate = 0.0005)?

Just resume from the current point. May be set

max_batches = 700200
steps=600000,650000

ggenny commented 4 years ago

Little progress ( 51% )

AlexeyAB commented 4 years ago

You will get another progress at steps=600000,650000 iterations up to ~54.5% mAP@0.5
Then YOLOv3-416 - 55.3% mAP@0.5 is achieved on test-dev rather than validation-dataset: you should validate it on test-dev images: http://images.cocodataset.org/zips/test2017.zip by using command ./darknet detector valid coco.data my.cfg my.weights and upload json-result to the evaluation server: http://cocodataset.org/#upload

ggenny commented 4 years ago

I have almost reached 600000 but the mAP is decreasing, set learning rate to 0.001 ?

AlexeyAB commented 4 years ago

I have almost reached 600000 but the mAP is decreasing, set learning rate to 0.001 ?

No, just steps=500000,650000 if you dont want wait for 600 000

AlexeyAB / darknet

yolov3 - train COCO 2014 from scratch #4416