Closed WTeichert closed 6 years ago
Hi,
height and weight (to a minimum of 224, possible? what should I do with anchors, devide by 2? Do I have to add "resize_network(nets + i, nets[i].w, nets[i].h);" in detector.c line 40-41?
width=224 height=224
in the your cfg-filerandom=1
then you should change these two line: https://github.com/AlexeyAB/darknet/blob/75c39f57507ed81f33571271b939de400a69adf0/src/detector.c#L98-L99to these, for resolution ~224x224
int dim = (rand() % 5 + 5) * 32;
if (get_current_batch(net)+100 > net.max_batches) dim = 224;
random=1
gives you about +1% mAPWhat does activation: leaky or linear do?
y = x
if(x>0) { y = x; } else { y = x/10; }
saturation/exposure are always the same, what do they do?
saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV The larger the value, the more invariance would neural network to change of lighting and color of the objects. More: https://github.com/AlexeyAB/darknet/issues/279#issuecomment-347002399
Thank you alot!
Of course I first watch the training. There I came to the point, that Darknet19 448x448 should be used. You've written that "This model performs significantly better but is slower since the whole image is larger.". Since I need to speed and tight up the whole algorithm, I want to use darknet19 on it s basic configuration.
Now my question, from where can I get these darknet19.conv.xx for training? May I can use yolo-voc-tiny.weights as my basic, like this backup training? (but my cfg changed in a few lines)
And one more question: I got access to a computation centre where I ve 4 CPUs and 2 GPUs I can use. As I've read on your page, YOLOv2 is not made for multi cpu. Are there have been some changes? Is it helpful, that tensorflow is configured for multi CPU?
There is path to darknet19_448.conv.23
: http://pjreddie.com/media/files/darknet19_448.conv.23
You can find this path here: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data
You can do darknet.exe partial tiny-yolo-voc.cfg tiny-yolo-voc.weights tiny-yolo-vo.conv.13 13
, so you will get pre-trained file tiny-yolo-vo.conv.13
, then you can use it for training
You can use multi-GPU for training: https://github.com/AlexeyAB/darknet#how-to-train-with-multi-gpu But for detection you can use only one GPU.
If you don't want to use GPU, then you can use multi-CPU (many Cores in one CPU and many CPUs on the one motherboard - ccNUMA) - but it's slow enough
build\darknet\darknet_no_gpu.sln
https://github.com/AlexeyAB/darknet#how-to-compile-on-windowsOPENMP=1
in the Makefile and run make
https://github.com/AlexeyAB/darknet#how-to-compile-on-linuxThank you again ^^ This partial training sounds interessting! I take a trained weight and make it as my pre-trained base? Where does the 13 come from (=number of layers - last "class" layer)? Can I use my own cfg, or do I need to use tiny-yolo-voc.cfg? Wouldn t I ve problems when they are different?
Little missunderstanding,
I look for darknet19, not trained on 448x448, so the previous version of it! I want to use it all, so 4 cpus + 2 gpus for training.
For detection I ve to look forward, to get the best out of a Raspberry Pi.
Hey, first of all, thank you for your time. I am now done with the trainings (had some different stuff to do), but it doesn t work out like I thought.
I tried to train on Pascal Voc, followed your instructions, went all fine. Not sure if that matters, I chose the pre-trained model Darknet19_448.conv.23 instead of darknet53.conv.74 (I think this was changed by you?) My cfg1 you can see below. With 45 000 iterations it mostly detects chairs, doesn t matter if it is a person or a dog or whatever for cfg4 i just changed width+height to 608 and multiplied the anchors by 4 -> their is no detection at all, also IOU and Recall are 0 when i try to valid the weights
Did I missed something or is it just a network conflict, that the parameters doesn t fit to the dataset? overview to all cfg's I trained
`[net] batch=64 subdivisions=64 width=224 height=224 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1
learning_rate=0.001 max_batches = 45000 policy=steps steps=100,25000,35000 scales=.1,.1,.1
[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=1
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
###########
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=125 activation=linear
[region] anchors = 0.54,0.60, 1.71,2.2, 3.32,5.69, 4.71,2.55, 8.31,5.26 bias_match=1 classes=20 coords=4 num=5 softmax=1 jitter=.2 rescore=1
object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1
absolute=1 thresh = .6 random=0 `
@WTeichert
darknet53.conv.74
should be used only if your cfg-file based on yolov3.cfg
(only for Yolo v3). But if your cfg-file based of tiny-yolo-voc.cfg
, or yolov2-tiny-voc.cfg
or yolo-voc.2.0.cfg
or yolov2-voc.cfg
(Yolo v2) then you should use darknet19_448.conv.23
On what cfg-file did you base your cfg-file?
Do you try to train yolo on CPU?
Can you get any good results or the results of all the trainings are bad?
How many iterations did you train?
darknet.exe detector map voc.data your.cfg your_40000.weights
I don t know
mAP doesn t work, get this error... File "...\YOLOv2\darknet-master\build\darknet\x64\voc_eval_py3.py", line 157, in voc_eval R = class_recs[image_ids[d]] KeyError: '003028'
For next day I ve nomore access to this data, so further data can be sended on monday
To this line (with your files: data, cfg, weights):
darknet.exe detector map data/obj.data cfg/yolo_obj.cfg yolo-obj.weights
And run it.
Also what cammnd do you use for training?
I ve tried both ways. First with my data and cfg, and 2nd with your tiny-voc and voc. Doesn t worked.
The command for training is written in train.cmd
darknet.exe detector train data/cfg1.data cfg/cfg1.cfg darknet19_448.conv.23
@WTeichert This command:
darknet.exe detector map data/cfg1.data cfg/cfg1.cfg cfg1_40000.weights
can't give this error, because there isn't any from Python.
What is the error gives this command?
Ahh little missunderstanding. I tried using map, but nothing happend (so far as i can see) So I chose calc_mAP_voc_py.cmd and changed line 8 to my files and line 9 to my voc dir
voc_eval_py3.py", line 157 was the error
Could it be, that I chose the learning rate too little, so the network doesn t learn something new out of new input size? Or could it be, that the change in anchors lead to this mistake?
Attach screenshot of "nothing happend" that happen after this command - you should wait sometimes 10 minutes while mAP will be calculated)
darknet.exe detector map data/cfg1.data cfg/cfg1.cfg cfg1_40000.weights
I don't know is there any mistake. I can't say anything without mAP. What repo did you use for training?
nothing happens, means nothing i can see directly. I am not sure which repo I am using (repo?) but since in the introduction is said "If you use another GitHub repository, then use darknet.exe detector recall... instead of darknet.exe detector map". I tried both and map has not give me any visual output. cmd just ended and console is waiting for next cmd, nothing happend (<- that s ment by it, their is no screenshot for it) so I chose recall to check IOU and recall. IOU I got maximum of 28% I copyed this github we re talking on and followed instructions for windwos, so I thought should be this repo... When I use darknet.exe detector valid, it created lot of blank class files in results. Instead with yolo-voc they were full with notes and detections, that s all I can say to mAP.
Problem is I am not into c programming, I just in a little python, so all the dector.c - compile and functions in c I understand on the very top.
I am not at office these days, so I can try once more on Tuesday, but I don t think it will change the results.
first commend was with recall instead of map, second doesn t give any result as far as I can see
first commend was with valid instead of recall and created the files in results
@WTeichert Try to update your code from this repo.
Done, but same error. Was someting changed to detector? Because I cannot update darknet version, since my MSVS liscenes expired.
You should recompile code in MSVS after that your repo is updated. Yes, there was added Yolo v3, fused batch_norm (+7% speedup), calc anchors and mAP, AVX on CPU (+20% speedup) and many other things... You can install free MSVS2015 community that I use: https://go.microsoft.com/fwlink/?LinkId=532606&clcid=0x409
Finally map works ! needed a few trys with cuda 9.1, 9.0, 8.0 and their cuDnn libarys cause this error accured: I solved it with creating new repo instead of updating.
It was the cfg which only gave me chairs as output
It was the cfg with 0 IOU and Recall and no detections.
The difference between them was: height weidth at first cfg=224 and at second 608
I just checked again difference between my cfg and tiny yolo voc. I changed anchors, weidth and height, deleted comments, and their are these lines: steps= -1,100,20000,30000 scales=.1,10,.1,.1 I chose instead: steps=100,25000,35000 scales=.1,.1,.1
Because I did not understood the -1
@WTeichert It's bad mAP result. Check your dataset using Yolo_mark. And use these lines:
learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=10,.1,.1
Ah found it, but I used PascalVoc Dataset, do I need to mark bounding boxes?
@AlexeyAB I ve done check. The labels are not correctly signed, like persons are chairs, cats are boats. Should be connected to the voc.names list, am I right?
But the bounding boxes are all right!
And that doesn t explain why detection doesnt work. Should I train new network with these lines? learning_rate=0.0001 max_batches = 45000 policy=steps steps=100,25000,35000 scales=10,.1,.1
That would be sad... to not know, why it doesn t work and just try again...
These are the results of yolov2-tiny-voc weights ... there should be an error somewhere else. How does mAP depends on names list? Could the order of names cause this error?
If I compare the labels folder of voc labels with the voc.names file there are changes. 5 and 8 should be dog and person, while in voc.names that s are bus and chair
but when I validate tiny-voc with some example pictures, it is pretty good
obj.data
@AlexeyAB
darknet.exe detector train data/cfg1.data cfg/cfg1.cfg darknet19_448.conv.23
darknet.exe detector map data/cfg1.data cfg/cfg1.cfg cfg1_final.weights
Btw. why do i get complete different IOU and recall with commend: ... detector map ... or ... detector ... recall
@AlexeyAB I tryed to train with changed learning rate and anchors to standard. But result is again 0 detections, mAP is 0 too. Do you have used the windows method to train tiny-yolo-voc.cfg? Is your version different from the repo? How many iterations did you trained? Which command do you used for training?
@WTeichert I trained any models on both Windows and Linux using this repo. It works fine.
@AlexeyAB I found a mistake in my train.txt. Now I gte 56,21% mAP for yolov2-tiny-voc
I tryed to train with 11 classes of VOC, so shortend class-list in voc-label.py and voc-names. Also set number of class to 11 in voc.data, cfg and last filter to 80. Again I get 0 mAP.
What was your average loss in training? I am always around 0,5 which seems pretty high.
@WTeichert About ~0.5
Ok, I found the problems. Was some mess with the voc.data and label.txt files.
But I am still wondering why the cfg of tiny-yolo-voc starts steps with -1. If you could explain me that fact, I won t ask anything anymore :D
step
with -1
means that 1st scale 0.1
will be applied immediately.
It was left just for some experiments.
This is: https://github.com/AlexeyAB/darknet/blob/5e3dcb6f34868e341466b57b13ad63f86b337250/cfg/yolov2-tiny-voc.cfg#L18-L22 the same as reduced learning_rate and removed 1st steps/scales:
learning_rate=0.0001
max_batches = 40200
policy=steps
steps=100,20000,30000
scales=10,.1,.1
Because net.steps[i] > batch_num
i.e. -1 > 0
then 1st scale is applied immediately: https://github.com/AlexeyAB/darknet/blob/5e3dcb6f34868e341466b57b13ad63f86b337250/src/network.c#L94-L101
Thank You so much for your help! My research is done and went all well.
The manipulation of the filter number per layer and the reduce of the resolution brings the best performances of a pi! Models, Classes and Greyscale are not that easy to manipulate, dependend of dataset. Random should be selected, Performance decrease is minimal.
@WTeichert Can you attach your result cfg-file?
Sorry, I lost the orginals at a system reset and have only the converted h5 files...
That were the results I found. Performance on Pi increased from 4s to 1s per picture with a pretty good mAP Mainfocus of my work were the decrease of fps. Hope these information can help
here the h5 file with changed layernumber based on COCO COCOh5.zip
here the h5 file with changed number of filter per layer based on COCO COCOh5_2.zip
Greatings everyone,
I am in the middel of my student research project. Therefor I am creating an object detection and classification which fits for a pi. I am using YAD2k running on PI, because it has less computational demands. I plan to train my network by VOC with different training cfg's.
I am asking you, for some advises, tipps or tricks I can use.
I will change so far:
I have also few questions: What does activation: leaky or linear do? saturation/exposure are always the same, what do they do?
Thank you for all inspiration! :)