Tiny Yolo training - Githubissues

yuliiasergeeva commented 6 years ago

Hello, what is the batch and subdivision for Tiny Yolo training? I used the one that was advised for Yolo and my avg loss was going up. And to be honest I am not even sure if they are related. Do you mind to advice why is this happening? Also, what is batch and subdivision and what they do? Thank you!

yuliiasergeeva commented 6 years ago

If I leave it as in Yolo Tiny it tells me to change to batch 64 subdivision 64. When I do that, training goes super fast, avg loss drops until 1.5-3.0 quickly. I tried to test the results without waiting until it goes down to 0.xxxx - it doesn't detect anything.

AlexeyAB commented 6 years ago

Hi, You should use batch 64 subdivisions 64. You should train more than 2000-4000 iterations.

yuliiasergeeva commented 6 years ago

I trained more than 4000 with batch 64 subdivisions 64. Now running it until avg loss drops below 1 and it is 25000+ iteration and it is still around 1.2-2.

yuliiasergeeva commented 6 years ago

Thinking that I may accidentally changed smth else in *.cfg file I downloaded the one you recommend again and started from scratch - avg loss went from 234 to 355 in a couple of minutes....

AlexeyAB commented 6 years ago

Check that you did everything as written here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects And check your dataset: https://github.com/AlexeyAB/Yolo_mark

yuliiasergeeva commented 6 years ago

I did, I checked it several times before asking here... And the dataset was created with your tool.

AlexeyAB commented 6 years ago

Compress and put in your message your cfg-file
What command do you use for training?

yuliiasergeeva commented 6 years ago

1) yolov3-tiny-obj.zip

2) darknet.exe detector train data/obj.data yolov3-tiny-obj.cfg yolov3-tiny.conv.15

Could it be acceptable that it goes up in the beginning and I just have to wait longer??

AlexeyAB commented 6 years ago

Do you have only two classes in your labels 0 and 1?
What mAP can you get for weights-file 25000+ iterations?
Can you show screenshot of your training process with avg-loss and iteration number?
Are you using specifically my repository?

yuliiasergeeva commented 6 years ago

I do train it for 2 objects only, yes

I got rid out of those, run overnight again, avg loss dropped to 0.4 On the 12600 iteration IoU is 75%, mAP is 84.29% I checked IoU and mAP for each 1000's iteration from 3000 to 12600 and IoU was also changing back and forth from 57% to 74%
1) 3000 IoU=60.21 mAP=78.69 2) 4000 IoU=57.12 mAP=79.23 3) 5000 IoU=71.97 mAP=82.27 4) 6000 IoU=63.64 mAP=79.77 5) 7000 IoU=63.58 mAP=79.32 6) 8000 IoU=58.61 mAP=80.11 7) 9000 IoU=73.98 mAP=82.53 8) 10000 IoU=74.67 mAP=83.90 9) 11000 IoU=69.66 mAP=83.63 10) 12000 IoU=78.53 mAP=81.08 11) 12600 IoU=75..08 mAP=84.29

sorry, I've closed the training process before I saw your message

yuliiasergeeva commented 6 years ago

Interesting enough, I've just checked IoU and mAP for the previously trained YoloV3 that kind of worked (detected custom objects, not perfectly but it did) and both of the values are 0.00% for any iterations I checked (5000, 11000). How it could be possible???

AlexeyAB commented 6 years ago

So, did you just train the same model with the same dataset, and did you get bad result in one case and good result in another case?

I do not know what happened.

yuliiasergeeva commented 6 years ago

1) I trained Yolo v3 with my own dataset. It worked, not perfectly but it did. I did not check mAP and IoU tho. 2) After that, I added more pictures to the dataset and tried to train Yolo v3 Tiny - it works sort of better than Yolo v3 trained before but still has some flows. For this one I checked IoU and mAP and you can see them above. 3) After that, I went back to Yolo v3 trained before and checked IoU and mAP - they all are 0.0

Sorry if what I wrote before was confusing. I hope now it makes more sense.

AlexeyAB commented 6 years ago

After that, I went back to Yolo v3 trained before and checked IoU and mAP - they all are 0.0

So can you show your Yolo v3 cfg-file and what command did you use for training?

yuliiasergeeva commented 6 years ago

yolo-obj.zip darknet.exe detector train data/obj.data yolo-obj.cfg darknet53.conv.74 - exactly the way you posted

yuliiasergeeva commented 6 years ago

And by the way, is there any guide on how to do validation?

Thank you so much for your help and your time! I appreciate it a lot!

AlexeyAB commented 6 years ago

Try to use ignore_thresh = .7 in each of 3 [yolo]-layers for training. And remove old weights from backup folder.

And by the way, is there any guide on how to do validation?

What do you mean? https://github.com/AlexeyAB/darknet#when-should-i-stop-training

yuliiasergeeva commented 6 years ago

I saw people using detector valid and detector recall but not sure what they do and how

AlexeyAB commented 6 years ago

detector valid and detector recall are something like detector map.

But detector recall gives you non-standard IoU indicator, that can be used only by Darknet C developers.
detector valid gives you txt files in results folder. After that you should run matlab or python script to get mAP
detector map gives you standrad indicators mAP, AP, IoU, F1, ... without any additional matlab or python scripts

yuliiasergeeva commented 6 years ago

Got it, so it is basically the same that you posted. It seems like I overcomplicated it in my head. Thank you so much for your help and sorry if there were dumb questions :)

AlexeyAB / darknet

Tiny Yolo training #937