amdegroot / ssd.pytorch

A PyTorch Implementation of Single Shot MultiBox Detector
MIT License
5.13k stars 1.74k forks source link

question regarding aspect_ratios and variance #192

Closed isalirezag closed 6 years ago

isalirezag commented 6 years ago

can someone please tell me what is the meaning of aspect ratio defining in this way? 'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]], isnot the aspect ratio the ratio between w and h? what [2] means and what [2,3] means?

Also what is variance used for?

I would reallyyyy appreciate if someone can tell me what are the following setting used for please

'lr_steps': (80000, 100000, 120000), ???
'max_iter': 120000, ???
'feature_maps': [38, 19, 10, 5, 3, 1], # i guess this is the size of the features map, right?
'steps': [8, 16, 32, 64, 100, 300], ???
'min_sizes': [30, 60, 111, 162, 213, 264],  ???
'max_sizes': [60, 111, 162, 213, 264, 315],  ???
'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]], ???
'variance': [0.1, 0.2],  ???
'clip': True, ???

Thanks

RyuJunHwan commented 6 years ago

Looking at the source code in prior_box.py, we have code like pbox + = [cx, cy, ~, ~] in the forward function. If you analyze the loop with this code, pbox + = [cx, cy, scale, scale], pbox+=[cx, cy, scale_prime, scale_prime] are the parts that make the prior box size of scale and scale_prime when aspect ratio is 1.

for ar in self.aspect_ratios [k]: pbox + = [cx, cy, scale sqrt (ar), scale / sqrt (ar)] pbox + = [cx, cy, scale / sqrt (ar), scale sqrt (ar)] This is where you create your pbox based on the aspect ratio you've specified. The above is a simple look at pbox + = [cx, cy, w, h] and pbox + = [cx, cy, h, w]. In other words, you can think of reversing the axis of the aspect ratio (flip aspect ratios).

The difference between [[2]] and [[2,3]] is related to the number of prior boxes to be predicted in each feature map layer.

isalirezag commented 6 years ago

im still confused. what are the steps for?

RyuJunHwan commented 6 years ago

I did not make an announcement and the answer is late. First, I will explain what I understand and use. To summarize, 'feature_maps' is the size of each multi layer in the SSD structure adopting multi-scale layer. 'min_sizes' and' max_sizes' are python 's list type, which means the min and max values ​​of the object size to be predicted at each layer position. And 'asepct_ratios' is a predefined aspect ratio, Add pbox + = [cx, cy, scale, scale] and pbox + = [cx, cy, scale_prime, scale_prime] to pbox respectively. So, by default, there are already two prior boxes with aspect_ratio=1. In addition, if it is 'asepct_ratios': [[2]], pbox + = [cx, cy, scale sqrt (ar), scale / sqrt (ar)] pbox + = [cx, cy, scale / sqrt (ar), scale sqrt (ar)] to rearrange the aspect_ratio. So I will make a total of 4 predictions from the layer at the current location.

As another example, 'aspect_ratios': [[2,3]], Basically, when the aspect ratio is 1 => 2 predictions + When aspect_ratio is 2 => 2 prediction + When aspect_ratio is 3 => 2 prediction

This makes a total of 6 predictions in current layer.

Also, in the two-dimensional list, the list of each dimension means each layer. If [2], [2,3], [2,3,4], [2] is the first layer, [2,3] is the second layer, and [2,3,4] is the third layer.

chi0tzp commented 6 years ago

@RyuJunHwan Could you please explain how you set up min_sizes and max_sizes for the various datasets? For instance, for the VOC dataset, theses lists are as follows

'min_sizes': [30, 60, 111, 162, 213, 264], 'max_sizes': [60, 111, 162, 213, 264, 315],

while for the COCO dataset, they are given as

"min_sizes": [21, 45, 99, 153, 207, 261], "max_sizes": [45, 99, 153, 207, 261, 315],

How should I set these lists if I want to use my own dataset?

Thanks a lot for your time and help.

jamiechoi1995 commented 5 years ago

see https://github.com/weiliu89/caffe/blob/ssd/examples/ssd/ssd_pascal.py#L299-L317