AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.74k stars 7.96k forks source link

Training with VOC data and darknet19_448.conv.23 #71

Closed zenvendof closed 7 years ago

zenvendof commented 7 years ago

Hi Alex,

Thanks for providing this build, I have followed your instruction and used VS2015, CUDA 8, CuDNN 5.1 and OpenCV 2.49 to ensure the build is as per your description on Github, loads of warnings but that seems to be normal because of double to float conversions. Here is my built log:

https://pastebin.com/EQdZJm1V

I have followed your instruction https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data and downloaded the VOC 2007 & 2012 data set running the python conversion script using Anaconda Python 3.6 and concatenated the results into a single txt file.

I have configured the yolo-voc.cfg to this: https://pastebin.com/BUWEn4Qu

Running the training using:

darknet.exe detector train data/voc.data yolo-voc.cfg darknet19_448.conv.23

I completed 3000 cycles i am running it again:

3003: 0.850337, 0.502447 avg, 0.001000 rate, 17.004000 seconds, 192192 images Loaded: 0.000000 seconds Region Avg IOU: 0.627269, Class: 0.830725, Obj: 0.000000, No Obj: 0.000002, Avg Recall: 0.727273, count: 11 Region Avg IOU: 0.698189, Class: 0.824391, Obj: 0.000001, No Obj: 0.000006, Avg Recall: 1.000000, count: 11 Region Avg IOU: 0.703497, Class: 0.806307, Obj: 0.000001, No Obj: 0.000003, Avg Recall: 0.947368, count: 19 Region Avg IOU: 0.619614, Class: 0.782578, Obj: 0.000001, No Obj: 0.000005, Avg Recall: 0.739130, count: 23 Region Avg IOU: 0.657017, Class: 0.737596, Obj: 0.000003, No Obj: 0.000005, Avg Recall: 0.785714, count: 14 Region Avg IOU: 0.654929, Class: 0.752741, Obj: 0.000000, No Obj: 0.000003, Avg Recall: 0.928571, count: 14 Region Avg IOU: 0.659402, Class: 0.894229, Obj: 0.000005, No Obj: 0.000005, Avg Recall: 0.923077, count: 13 Region Avg IOU: 0.766720, Class: 0.915278, Obj: 0.000000, No Obj: 0.000003, Avg Recall: 1.000000, count: 12

Are the avg heading at the right direction? As the retraining example has 20 class objects at 3000 epoch, no detection possible? For example since the darknet19 model was trained on 1000 classification and 1 million images, when I run it before the retrain it doesnt work. Is that suppose be the case? After how many cycles I should start seeing detection?

Thanks, Zen

AlexeyAB commented 7 years ago

@zenvendof Hi,


For example since the darknet19 model was trained on 1000 classification and 1 million images, when I run it before the retrain it doesnt work. Is that suppose be the case?

How did you run darknet19 model?

darknet19_448.conv.23-file contains only convolutional layers, without last layer for object detection, so it can't be used for detection

mursalal commented 7 years ago

Hello Alex! Please help me. Why i have several classes in output?

` darknet git:(master) ✗ ./darknet detector test data/flickr27.data cfg/yolo-flickr27.cfg yolo-flickr27_4000.weights 1200px-BMW.svg.png -thresh 0.001 layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 25 route 16 26 reorg / 2 26 x 26 x 512 -> 13 x 13 x2048 27 route 26 24 28 conv 1024 3 x 3 / 1 13 x 13 x3072 -> 13 x 13 x1024 29 conv 160 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 160 30 detection Loading weights from yolo-flickr27_4000.weights...Done! 1200px-BMW.svg.png: Predicted in 0.067362 seconds. vodafone: 0% citroen: 0% bmw: 0% bmw: 95% bmw: 0% vodafone: 0% vodafone: 0% vodafone: 0% vodafone: 0% vodafone: 0% vodafone: 0%

(predictions:89803): Gtk-WARNING **: cannot open display:`

AlexeyAB commented 7 years ago

@mursalal Because you use too small -thresh 0.001.

mursalal commented 7 years ago

I mean so many duplicates of each class in output. Not just citroen, bmw and vodafone, but citroen, 3 bmws, 7 vodafones. Is there any reason for concern?

AlexeyAB commented 7 years ago

@mursalal Don't wory, Yolo makes 845 assumptions for each class on each image.

Just train (2000*classes) iterations and set -thresh 0.2

mursalal commented 7 years ago

Thank you for response. Sorry for so many questions, but can you tell me about small object detection? I trained on BelgaLogos dataset, but all is bad. IOU < 0.2, Recall !> 0.1 and does not increase. When i do this https://groups.google.com/d/msg/darknet/MumMJ2D8H9Y/QFdfkMjECwAJ. I get out of memory error:

CUDA Error: out of memory darknet: ./src/cuda.c:36: check_error: Assertion `0' failed. [1] 90389 abort (core dumped) ./darknet detector train data/belgalogos.data cfg/yolo-belgalogos.cfg

AlexeyAB commented 7 years ago

@mursalal

mursalal commented 7 years ago

four NVIDIA Corporation GK104GL [GRID K520]

[net] batch=64 subdivisions=64 height=1088 width=1088 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.0001 max_batches = 45000 policy=steps steps=100,25000,35000 scales=10,.1,.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

#######

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[route] layers=-9

[reorg] stride=2

[route] layers=-1,-3

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=210 activation=linear

[region] anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 bias_match=1 classes=37 coords=4 num=5 softmax=1 jitter=.2 rescore=1

object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1

absolute=1 thresh = .6 random=0

mursalal commented 7 years ago

i use same file for training and detection

AlexeyAB commented 7 years ago

@mursalal I.e. does one GPU-device use 4 GB GPU RAM? And what size of small object in pixels, and what size of images?

mursalal commented 7 years ago

For first 1000 iters i use one GPU as you said in your "How to train with multi-GPU" guide. Object size 30x30 pixels, or 100x50. Image size: 800x500, 800x1200, 800x1247 and so on. For example several images 07585294 07602751

mursalal commented 7 years ago

objects: umbro and puma

AlexeyAB commented 7 years ago

@mursalal


If it doesn't help, then train again:

zenvendof commented 7 years ago

@AlexeyAB Here is what I used to start the training:

darknet.exe detector train data/voc.data yolo-voc.cfg darknet19_448.conv.23

I tested with your commands (e.g. thresh 0.001) on a 6000 epoch model there wasn't any detection, so I probably doing it wrong!

I will use your yolo2 cfg to try again to retrain from Epoch 1.... :-P I tested from epoch 100 seems like it's detecting dog.jpg with a low threshold of 0.001 with multiple bound box on the dog...will keep you updated.

Thanks for your help.

Z

mursalal commented 7 years ago

@AlexeyAB

If it doesn't help, then train again: Set width=800 and height=800 (network size should be equal or less than image size) subdivisions=32 train again (2000*classes) iterations

Same error: Out of memory. Is it possible to start train with all gpus at once?

AlexeyAB commented 7 years ago

@mursalal

Is it possible to start train with all gpus at once?

This is not recommended. But you can.

Also you have very few images per class: 72 = (2650/37). It is recommended to use 1000-2000 images per class, otherwise there will be poor detection accuracy.

mursalal commented 7 years ago

Thank you for responses @AlexeyAB . It helped on several gpus

zenvendof commented 7 years ago

@mursalal just out of curiosity I checked GK104GL, it has 4GB, as the model training is just distributed across the 4 GPUS and they are probably not sharing memory with each other. You may have a memory limitation here. I suggest you use GPU-Z or similar tools to see if you are hitting memory limits. I am using the standard training resolution and I am seeing 3GB usage. The GTX 1060/1070/1080 should have enough memory for Darknet training.

mursalal commented 7 years ago

@zenvendof Yeah, you are right. It is definitely lack of memory.

zenvendof commented 7 years ago

@AlexeyAB Thanks for your help I checked the latest Yolo2 cfg based training at 6000 epoch detected the dog.jpg 3 objects nicely. I have allowed it to continue as I am trying to bring the avg down. At the moment it is at:

7766: 0.340575, 0.485455 avg, 0.001000 rate, 9.980000 seconds, 497024 images Loaded: 0.001000 seconds Region Avg IOU: 0.494297, Class: 0.916284, Obj: 0.380216, No Obj: 0.008471, Avg Recall: 0.545455, count: 22 Region Avg IOU: 0.757582, Class: 0.934223, Obj: 0.686376, No Obj: 0.009964, Avg Recall: 0.888889, count: 18 Region Avg IOU: 0.686125, Class: 0.995181, Obj: 0.674135, No Obj: 0.008466, Avg Recall: 0.866667, count: 15 Region Avg IOU: 0.713554, Class: 0.995739, Obj: 0.631079, No Obj: 0.009759, Avg Recall: 0.875000, count: 16 Region Avg IOU: 0.781846, Class: 0.982140, Obj: 0.669157, No Obj: 0.008633, Avg Recall: 1.000000, count: 13 Region Avg IOU: 0.464987, Class: 0.957074, Obj: 0.364721, No Obj: 0.008558, Avg Recall: 0.419355, count: 31 Region Avg IOU: 0.771378, Class: 0.987586, Obj: 0.503316, No Obj: 0.009284, Avg Recall: 0.947368, count: 19 Region Avg IOU: 0.771566, Class: 0.982732, Obj: 0.685755, No Obj: 0.007306, Avg Recall: 1.000000, count: 12 7767: 0.364381, 0.473347 avg, 0.001000 rate, 10.011000 seconds, 497088 images Loaded: 0.001000 seconds

This is at Epoch 7700++. What is the targeted avg performance is it 0.0006xxx? When it reach 45000 would it be able to get there? Thanks!

By the way is there any scripts to calculate the mAP?

Cheers, Zen

AlexeyAB commented 7 years ago

@zenvendof

There is no specific value for the avg loss. It depends on dataset, many parameters and random variables. Just wait when you see that average loss 0.xxxx avg no longer decreases then you should stop training.

To calculate mAP you should use your own Python script, such as reval.py from: https://github.com/rbgirshick/py-faster-rcnn/tree/master/tools

More detail: https://github.com/AlexeyAB/darknet/issues/16

meedddhhhhaaaaa commented 7 years ago

@AlexeyAB You said: darknet19_448.conv.23-file contains only convolutional layers, without last layer for object detection, so it can't be used for detection So if I want to train a model for object detection, what weights do I use to train?? I tried training with darknet19_448.weights but did not get any bounding box on the output image; Should I train with another weights file??

zenvendof commented 7 years ago

@medhasn

Make sure you using:

https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolo-voc.2.0.cfg for your training and remember to reset the parameters:

[net] batch=1 subdivisions=1 height=416 width=416

That is highlighted here before to run your trained model, as I reported back to @AlexeyAB in my previous comments, at Epoch 100 with a threshold of 0.001 you should start get detection using the dog.jpg with the dog bounded by a few boxes. After the training the detection layer will be added to your weights and you will see the model growing to 266mb+ in size.

Zen

meedddhhhhaaaaa commented 7 years ago

@zenvendof thanks! will try......

hyy1111 commented 7 years ago

I have also encountered the same problem. I using my own dataset on the window version of yolo, and it cannot be trained with yolo-voc.cfg (after training there is no detection) but it can be trained with yolo-voc.2.0.cfg. why there is such problem?

I found the linux version of yolo (original authors) can be trained with the yolo-voc.cfg. The difference between yolo-voc.cfg and yolo-voc.2.0.cfg is small, [reorg] [route] layer and anchors are different. will it greatly influence the final precision? Thanks in advance.

AlexeyAB commented 7 years ago

@hyy1111 This will have little effect on accuracy. In the main Linux fork, additional logic is added, which is not yet fully debugged (especially on non-square network resolutions there can be some artefacts).

zenvendof commented 7 years ago

Closing this now that I can train on VOC data. Going to open another one on Custom Data :) :) :)