AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.64k stars 7.95k forks source link

first detection/region layer YOLOv3 #990

Open anandkoirala opened 6 years ago

anandkoirala commented 6 years ago

Hi @AlexeyAB On my custom dataset while training YOLOv3 the first region layer 82 gets 'nan' most of the time. I have also modified the YOLOv3 architecture and again it throws most 'nan' on the first region layer 14 on new architecture but the same dataset.. this is more pronounced for lower network resolution.. other region layers look totally fine... I have used the default anchors of YOLOv3 and also used the custom K-means clustering on my dataset without improvement... the default anchors were all good.. What could be the reason.... the object size? or the dimension of anchor subset (mask 6,7,8).. the input images contains many objects in clusters... one object class. Min obj size= 15x16 pixels max obj size = 44x51 pixels average obj size = 41x46 pixels Input image size = 612x512 pixels Any suggestions would be appreciated.. Regards, Anand

AlexeyAB commented 6 years ago

@anandkoirala Hi,

Perhaps there are very few objects with sizes equal to anchors with indexies 6,7,8. Or may be to high subsampling (many layers with stride=2) is used for the first [yolo]-layer.

anandkoirala commented 6 years ago

@AlexeyAB this is for 416x416 anchors my network is 512x512 as below anchors1 there are only 13 layers before first YOLO-layer

AlexeyAB commented 6 years ago

@anandkoirala Do you get nan for the Region-14 even after 2000 or 10 000 iterations?

anandkoirala commented 6 years ago

Yes... if using the coco default anchors of YOLOv3... I used the anchors generated on my dataset and now it looks good still sometimes throwing 'nan' for region14.

changing the topic.. i trained a model for 120k iterations and i wanted to do a transfer learning by using the saved model at 120K.. I have to change the max batches to a new number adding 120K to the value... this was alright as YOLO remembers the iteration for the saved weights?.. however on the loss vs iteration graph the plot started from 120K .... well then i don't know if because of policies the learning rate started at 0.0001 so i have to change back the learning rate on cfg file to 0.1 so that it will start at 0.001 But when we use the pre-trained weights provided by you on the website everything can start from iteration 1.... How can i change the saved weight at 120K to behave as a pre-trained weight allowing to transfer learn that starts at iteration 1?

AlexeyAB commented 6 years ago