AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

Raspberry Pi YOLO Training #289

Closed WTeichert closed 6 years ago

WTeichert commented 6 years ago

Greatings everyone,

I am in the middel of my student research project. Therefor I am creating an object detection and classification which fits for a pi. I am using YAD2k running on PI, because it has less computational demands. I plan to train my network by VOC with different training cfg's.

I am asking you, for some advises, tipps or tricks I can use.

I will change so far:

I have also few questions: What does activation: leaky or linear do? saturation/exposure are always the same, what do they do?

Thank you for all inspiration! :)

AlexeyAB commented 6 years ago

Hi,

height and weight (to a minimum of 224, possible? what should I do with anchors, devide by 2? Do I have to add "resize_network(nets + i, nets[i].w, nets[i].h);" in detector.c line 40-41?

to these, for resolution ~224x224

            int dim = (rand() % 5 + 5) * 32;
            if (get_current_batch(net)+100 > net.max_batches) dim = 224;

What does activation: leaky or linear do?


saturation/exposure are always the same, what do they do?

saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV The larger the value, the more invariance would neural network to change of lighting and color of the objects. More: https://github.com/AlexeyAB/darknet/issues/279#issuecomment-347002399

WTeichert commented 6 years ago

Thank you alot!

Of course I first watch the training. There I came to the point, that Darknet19 448x448 should be used. You've written that "This model performs significantly better but is slower since the whole image is larger.". Since I need to speed and tight up the whole algorithm, I want to use darknet19 on it s basic configuration.

Now my question, from where can I get these darknet19.conv.xx for training? May I can use yolo-voc-tiny.weights as my basic, like this backup training? (but my cfg changed in a few lines)

And one more question: I got access to a computation centre where I ve 4 CPUs and 2 GPUs I can use. As I've read on your page, YOLOv2 is not made for multi cpu. Are there have been some changes? Is it helpful, that tensorflow is configured for multi CPU?

AlexeyAB commented 6 years ago
WTeichert commented 6 years ago

Thank you again ^^ This partial training sounds interessting! I take a trained weight and make it as my pre-trained base? Where does the 13 come from (=number of layers - last "class" layer)? Can I use my own cfg, or do I need to use tiny-yolo-voc.cfg? Wouldn t I ve problems when they are different?

Little missunderstanding,

I look for darknet19, not trained on 448x448, so the previous version of it! I want to use it all, so 4 cpus + 2 gpus for training.

For detection I ve to look forward, to get the best out of a Raspberry Pi.

AlexeyAB commented 6 years ago
WTeichert commented 6 years ago

Hey, first of all, thank you for your time. I am now done with the trainings (had some different stuff to do), but it doesn t work out like I thought.

I tried to train on Pascal Voc, followed your instructions, went all fine. Not sure if that matters, I chose the pre-trained model Darknet19_448.conv.23 instead of darknet53.conv.74 (I think this was changed by you?) My cfg1 you can see below. With 45 000 iterations it mostly detects chairs, doesn t matter if it is a person or a dog or whatever for cfg4 i just changed width+height to 608 and multiplied the anchors by 4 -> their is no detection at all, also IOU and Recall are 0 when i try to valid the weights

Did I missed something or is it just a network conflict, that the parameters doesn t fit to the dataset? overview to all cfg's I trained

cfg_overview.pdf

`[net] batch=64 subdivisions=64 width=224 height=224 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 max_batches = 45000 policy=steps steps=100,25000,35000 scales=.1,.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=125 activation=linear

[region] anchors = 0.54,0.60, 1.71,2.2, 3.32,5.69, 4.71,2.55, 8.31,5.26 bias_match=1 classes=20 coords=4 num=5 softmax=1 jitter=.2 rescore=1

object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1

absolute=1 thresh = .6 random=0 `

AlexeyAB commented 6 years ago

@WTeichert

  1. darknet53.conv.74 should be used only if your cfg-file based on yolov3.cfg (only for Yolo v3). But if your cfg-file based of tiny-yolo-voc.cfg, or yolov2-tiny-voc.cfg or yolo-voc.2.0.cfg or yolov2-voc.cfg (Yolo v2) then you should use darknet19_448.conv.23

  2. On what cfg-file did you base your cfg-file?

  3. Do you try to train yolo on CPU?

  4. Can you get any good results or the results of all the trainings are bad?

  5. How many iterations did you train?

WTeichert commented 6 years ago
  1. Than I was right
  2. based on tiny-yolo-voc.cfg
  3. trained on gpu, cpu was way to slow (50min for 1k iterations)
  4. no, all are trash
  5. 45k-60k, see max batches. +I trained from windows
AlexeyAB commented 6 years ago
WTeichert commented 6 years ago

For next day I ve nomore access to this data, so further data can be sended on monday

AlexeyAB commented 6 years ago

To this line (with your files: data, cfg, weights): darknet.exe detector map data/obj.data cfg/yolo_obj.cfg yolo-obj.weights

And run it.

Also what cammnd do you use for training?

WTeichert commented 6 years ago

I ve tried both ways. First with my data and cfg, and 2nd with your tiny-voc and voc. Doesn t worked.

The command for training is written in train.cmd darknet.exe detector train data/cfg1.data cfg/cfg1.cfg darknet19_448.conv.23

AlexeyAB commented 6 years ago

@WTeichert This command: darknet.exe detector map data/cfg1.data cfg/cfg1.cfg cfg1_40000.weights can't give this error, because there isn't any from Python.

What is the error gives this command?

WTeichert commented 6 years ago

Ahh little missunderstanding. I tried using map, but nothing happend (so far as i can see) So I chose calc_mAP_voc_py.cmd and changed line 8 to my files and line 9 to my voc dir

voc_eval_py3.py", line 157 was the error

Could it be, that I chose the learning rate too little, so the network doesn t learn something new out of new input size? Or could it be, that the change in anchors lead to this mistake?

AlexeyAB commented 6 years ago

Attach screenshot of "nothing happend" that happen after this command - you should wait sometimes 10 minutes while mAP will be calculated) darknet.exe detector map data/cfg1.data cfg/cfg1.cfg cfg1_40000.weights

I don't know is there any mistake. I can't say anything without mAP. What repo did you use for training?

WTeichert commented 6 years ago

nothing happens, means nothing i can see directly. I am not sure which repo I am using (repo?) but since in the introduction is said "If you use another GitHub repository, then use darknet.exe detector recall... instead of darknet.exe detector map". I tried both and map has not give me any visual output. cmd just ended and console is waiting for next cmd, nothing happend (<- that s ment by it, their is no screenshot for it) so I chose recall to check IOU and recall. IOU I got maximum of 28% I copyed this github we re talking on and followed instructions for windwos, so I thought should be this repo... When I use darknet.exe detector valid, it created lot of blank class files in results. Instead with yolo-voc they were full with notes and detections, that s all I can say to mAP.

Problem is I am not into c programming, I just in a little python, so all the dector.c - compile and functions in c I understand on the very top.

I am not at office these days, so I can try once more on Tuesday, but I don t think it will change the results.

WTeichert commented 6 years ago

image first commend was with recall instead of map, second doesn t give any result as far as I can see

image first commend was with valid instead of recall and created the files in results

AlexeyAB commented 6 years ago

@WTeichert Try to update your code from this repo.

WTeichert commented 6 years ago

Done, but same error. Was someting changed to detector? Because I cannot update darknet version, since my MSVS liscenes expired.

AlexeyAB commented 6 years ago

You should recompile code in MSVS after that your repo is updated. Yes, there was added Yolo v3, fused batch_norm (+7% speedup), calc anchors and mAP, AVX on CPU (+20% speedup) and many other things... You can install free MSVS2015 community that I use: https://go.microsoft.com/fwlink/?LinkId=532606&clcid=0x409

WTeichert commented 6 years ago

Finally map works ! needed a few trys with cuda 9.1, 9.0, 8.0 and their cuDnn libarys cause this error accured: errorcuda81 I solved it with creating new repo instead of updating.

grafik It was the cfg which only gave me chairs as output

grafik It was the cfg with 0 IOU and Recall and no detections.

The difference between them was: height weidth at first cfg=224 and at second 608

WTeichert commented 6 years ago

I just checked again difference between my cfg and tiny yolo voc. I changed anchors, weidth and height, deleted comments, and their are these lines: steps= -1,100,20000,30000 scales=.1,10,.1,.1 I chose instead: steps=100,25000,35000 scales=.1,.1,.1

Because I did not understood the -1

AlexeyAB commented 6 years ago

@WTeichert It's bad mAP result. Check your dataset using Yolo_mark. And use these lines:

learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=10,.1,.1
WTeichert commented 6 years ago

Ah found it, but I used PascalVoc Dataset, do I need to mark bounding boxes?

@AlexeyAB I ve done check. The labels are not correctly signed, like persons are chairs, cats are boats. Should be connected to the voc.names list, am I right?

But the bounding boxes are all right!

And that doesn t explain why detection doesnt work. Should I train new network with these lines? learning_rate=0.0001 max_batches = 45000 policy=steps steps=100,25000,35000 scales=10,.1,.1

That would be sad... to not know, why it doesn t work and just try again...

WTeichert commented 6 years ago

image These are the results of yolov2-tiny-voc weights ... there should be an error somewhere else. How does mAP depends on names list? Could the order of names cause this error?

If I compare the labels folder of voc labels with the voc.names file there are changes. 5 and 8 should be dog and person, while in voc.names that s are bus and chair

but when I validate tiny-voc with some example pictures, it is pretty good

AlexeyAB commented 6 years ago
WTeichert commented 6 years ago

@AlexeyAB

Btw. why do i get complete different IOU and recall with commend: ... detector map ... or ... detector ... recall image

WTeichert commented 6 years ago

@AlexeyAB I tryed to train with changed learning rate and anchors to standard. But result is again 0 detections, mAP is 0 too. Do you have used the windows method to train tiny-yolo-voc.cfg? Is your version different from the repo? How many iterations did you trained? Which command do you used for training?

AlexeyAB commented 6 years ago

@WTeichert I trained any models on both Windows and Linux using this repo. It works fine.

WTeichert commented 6 years ago

@AlexeyAB I found a mistake in my train.txt. Now I gte 56,21% mAP for yolov2-tiny-voc

I tryed to train with 11 classes of VOC, so shortend class-list in voc-label.py and voc-names. Also set number of class to 11 in voc.data, cfg and last filter to 80. Again I get 0 mAP.

What was your average loss in training? I am always around 0,5 which seems pretty high.

AlexeyAB commented 6 years ago

@WTeichert About ~0.5

WTeichert commented 6 years ago

Ok, I found the problems. Was some mess with the voc.data and label.txt files.

But I am still wondering why the cfg of tiny-yolo-voc starts steps with -1. If you could explain me that fact, I won t ask anything anymore :D

AlexeyAB commented 6 years ago

step with -1 means that 1st scale 0.1 will be applied immediately. It was left just for some experiments.

This is: https://github.com/AlexeyAB/darknet/blob/5e3dcb6f34868e341466b57b13ad63f86b337250/cfg/yolov2-tiny-voc.cfg#L18-L22 the same as reduced learning_rate and removed 1st steps/scales:

learning_rate=0.0001
max_batches = 40200
policy=steps
steps=100,20000,30000
scales=10,.1,.1

Because net.steps[i] > batch_num i.e. -1 > 0 then 1st scale is applied immediately: https://github.com/AlexeyAB/darknet/blob/5e3dcb6f34868e341466b57b13ad63f86b337250/src/network.c#L94-L101

WTeichert commented 6 years ago

Thank You so much for your help! My research is done and went all well.

The manipulation of the filter number per layer and the reduce of the resolution brings the best performances of a pi! Models, Classes and Greyscale are not that easy to manipulate, dependend of dataset. Random should be selected, Performance decrease is minimal.

AlexeyAB commented 6 years ago

@WTeichert Can you attach your result cfg-file?

WTeichert commented 6 years ago

Sorry, I lost the orginals at a system reset and have only the converted h5 files...

That were the results I found. Performance on Pi increased from 4s to 1s per picture with a pretty good mAP Mainfocus of my work were the decrease of fps. Hope these information can help

here the h5 file with changed layernumber based on COCO COCOh5.zip

here the h5 file with changed number of filter per layer based on COCO COCOh5_2.zip