Jumabek / darknet_scripts

Auxilary scripts to work with (YOLO) darknet deep learning famework. AKA -> How to generate YOLO anchors?
223 stars 96 forks source link

Why image size is not used? Why classes other than 0 are ignored? (gen_anchors) #3

Open CemalAker opened 7 years ago

CemalAker commented 7 years ago

Thanks for the great script. But I could not figure out why the image size is not used? The image is read and the size of it is written to [im_h,im_w] but they are not used. I expected that before

annotation_dims.append(map(float,(w,h)))

w and h are multiplied with them to get unnormalized w and h. Or am I wrong about that anchors are in unnormalized sizes? Should they be in relative size?

The second question is about ignoring the classes other than that with id=0 if cls_id!=0: continue

Why this is so? Thanks

Jumabek commented 7 years ago

Hi @ZetilenZoe ,

  1. Image size is trash from old code. I cleaned a little bit.

Since I use yolo format annotations (relative size bbox) for computing anchors, we do not need image size.

I strongly believe anchors should be in relative size because that way we can upscale the anchors to 13x13 grid size once we compute them using relative size.

It seems to me original author used unnormalized anchors, but they do not differ much (comparison here https://github.com/Jumabek/darknet_scripts ). and I think anchors generated by gen_anchors.py will perform a little bit better than the original authors because @pjreddie used unnormilized size.

  1. I owe you big credit for the second question, for my case I wanted to compute anchors for only one class which is 'person' and ignore the 'people' class so I did is with the following code

    if cls_id!=0:
    continue

    It is a trail, I removed it now.

Thanks for the catch.

CemalAker commented 7 years ago

Thanks for the quick reply and update. Since my dataset is huge, I want to clarify something before starting. What do you mean by

we can upscale the anchors to 13x13 grid size once we compute them using relative size

  1. Should we change the computed anchors after computation ends or what you mean is the region layer will upscale them automatically? I ask this because I want to use different resolution.

  2. How many images have you tried it with and how much time it takes? This is because my dataset is very huge.

Thanks in advance

EDIT: I have used 10K of the 1M data and it converged very quickly (in 54 iterations), so the second question can be ignored.

Jumabek commented 7 years ago

The code will first compute anchors in the range of [0,1], then in this line https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py#L52 It will multiply anchors by the (input_width in the cfg file)/downsampling_factor in other words if your input_width is 416 then anchors get upscaled to 416/32=13

Feel free to ask if you need more clarification

Jumabek commented 7 years ago

BTW if your net cfg accepts input images bigger than 416x416 size then you need to change this lines accordingly https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py#L17 before computing anchors.

CemalAker commented 7 years ago

Thank you very much again @Jumabek . I have understood the solution :)

groot-1313 commented 6 years ago

@Jumabek I have images of varying sizes(320x240 to 704x576, which is a huge difference) and intend to set the parameter random to 1, in order for the images not to be resized. In such a case, what should width_in_cfg_file and height_in_cfg_file be set to?