AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.63k stars 7.95k forks source link

training wasn't converging with visdrone2019 dataset #3267

Open gameliee opened 5 years ago

gameliee commented 5 years ago

I've try to training v3-tiny model with the visdrone2019 dataset. It doesn't seem converging so far. Could you kindly give me some advice. Thanks a lot.

Data: The objects in this dataset is quite small. When calculating anchors point with the size of 416x416, the results was anchors = 3, 6, 6, 13, 14, 11, 13, 27, 28, 31, 49, 64

What I've done: recalculated anchors, verified annotations to be correct, changed saturation = 1.8, exposure = 1.8, jiters = .8 and changed learning rate a bit.

The chart chart

The console output:

(next mAP calculation at 1808 iterations) 1592: nan, nan avg loss, 0.001000 rate, 0.903414 seconds, 101888 images Loaded: 0.000038 seconds Region 16 Avg IOU: nan, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 42 Region 23 Avg IOU: nan, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 104 OpenCV can't augment image: 480 x 480 Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0 Region 23 Avg IOU: nan, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 83 OpenCV can't augment image: 480 x 480 OpenCV can't augment image: 480 x 480 Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000000, .5R: -nan, .75R: -nan, count: 0

here is the config

[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.8
exposure = 1.8
hue=.1

learning_rate=0.01
burn_in=1000
max_batches = 100000
policy=steps
steps=1000,40000,80000
scales=.1,.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=51
activation=linear

[yolo]
mask = 3,4,5
anchors = 3,  6,   6, 13,  14, 11,  13, 27,  28, 31,  49, 64
classes=12
num=6
jitter=.8
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2
# stride=2

[route]
layers = -1, 8
# layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=51
activation=linear

[yolo]
mask = 0,1,2
anchors = 3,  6,   6, 13,  14, 11,  13, 27,  28, 31,  49, 64
classes=12
num=6
jitter=.8
ignore_thresh = .7
truth_thresh = 1
random=1
max=200
AlexeyAB commented 5 years ago

@ntd94 Hi,

OpenCV can't augment image: 480 x 480 OpenCV can't augment image: 480 x 480

It means that some of your images are broken.

Can you show content of files? bad.list bad_label.list


  1. Firstly, try to train by using default model https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg with default params

  2. Also run training with flag -show_imgs do you see correct labels on images?

gameliee commented 5 years ago

Hi @AlexeyAB , Thanks for your reply. I've done tasks as you recommended.

  1. I've checked bad_label.list file and delete all corresponding images. There wasn't bad.list file.
  2. Done training with the yolov3-tiny_3l config. After 4 days of training, the result still hasn't looked promising. Screenshot from 2019-06-06 08-22-03
  3. Train with -show_imgs flag, the console stuck here Screenshot from 2019-06-06 08-38-37

What should I do now?

AlexeyAB commented 5 years ago

@ntd94

It seems that you should train with higher resolution.

gameliee commented 5 years ago

@AlexeyAB

AlexeyAB commented 5 years ago

@ntd94

With TRT you can't use PAN, Trident, LSTM network currently.

Try to use default SPP-model https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-spp.cfg and https://pjreddie.com/media/files/yolov3-spp.weights with https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/yolo/README.md does it work successfully?