YonghaoHe / LFFD-A-Light-and-Fast-Face-Detector-for-Edge-Devices

A light and fast one class detection framework for edge devices. We provide face detector, head detector, pedestrian detector, vehicle detector......

MIT License

1.32k stars 329 forks source link

关于感受野 #28

Closed dongfangduoshou123 closed 5 years ago

dongfangduoshou123 commented 5 years ago

`# feature map size for each scale param_feature_map_size_list = [159, 159, 79, 79, 39, 19, 19, 19]

bbox lower bound for each scale

param_bbox_small_list = [10, 15, 20, 40, 70, 110, 250, 400] assert len(param_bbox_small_list) == param_num_output_scales

bbox upper bound for each scale

param_bbox_large_list = [15, 20, 40, 70, 110, 250, 400, 560] assert len(param_bbox_large_list) == param_num_output_scales

bbox gray lower bound for each scale

param_bbox_small_gray_list = [math.floor(v * 0.9) for v in param_bbox_small_list]

bbox gray upper bound for each scale

param_bbox_large_gray_list = [math.ceil(v * 1.1) for v in param_bbox_large_list]

the RF size of each scale used for normalization, here we use param_bbox_large_list for better regression

param_receptive_field_list = param_bbox_large_list

RF stride for each scale

param_receptive_field_stride = [4, 4, 8, 8, 16, 32, 32, 32]

the start location of the first RF of each scale

param_receptive_field_center_start = [3, 3, 7, 7, 15, 31, 31, 31]`

大佬您好：　　首先，感谢开源．看完ｆａｃｅdetection里config_farm和data_iterator_farm里的内容后，关于感受野有点疑问：１，您实际训练中，８个ｂｒａｎｃｈ中每个ｂｒａｎｃｈ的ＲＦ的大小实际不是用的感受野计算公式逐层迭代得到，而是直接引用了每个尺度的上边界作为该ｂｒａｎｃｈ的ＲＦ大小，是处于ＥＲＦ的考量吗？２，起始的感受也中心 [3, 3, 7, 7, 15, 31, 31, 31]是怎么得到的？（按照网上说的起始感受野位置计算公式center_out=center_in + ((kernel - 1）/2 +p)*不包括当前层之前的ｓｔｒｉｄｅ累积，计算的话，比这大多了．．

希望大佬能不吝赐教，谢谢！

YonghaoHe commented 5 years ago

@dongfangduoshou123 对你的两个疑问回答如下: 1, 有一部分ERF的考虑. 同时这样normalization以后要回归的值会在[-2, 2]之间, 梯度的计算比较合适. 2, 感受野起始位置的计算网上能搜到一些讲解.之前issue里面也有人问了,我也画图回答了,访问这里#20

dongfangduoshou123 commented 5 years ago

谢谢！ ![Uploading ganshouyejisuan.jpeg…]()

我想确认一下上图这个公式对吗？startout的计算那个。如果按照上面这个计算公式的话（来源：https://juejin.im/entry/5a2f4f796fb9a044fa19d316 ，您给的这个截图的例子： https://github.com/YonghaoHe/A-Light-and-Fast-Face-Detector-for-Edge-Devices/issues/20 L1层是图像本身第一个感受野中心　＝　０．５Ｌ２的第一个感受野中心　＝　０．５　＋　（（３－１）／２　－　０）×1　＝　1．５Ｌ３的第一个感受野中心＝　1．５　＋（（3-1） / 2 - 1） × 2= 1.5 是向下取整得到L3第一个感受野中心为1吗？是不是公式中那个jin也就是卷积核的stride的累积，不包括当前层的卷积核stride呢？

如何是这样的话： param_receptive_field_center_start = [3.5, 3.5, 7.5, 7.5, 15.5, 31.5, 31.5, 31.5] 向下取整得到： param_receptive_field_center_start = [3, 3, 7, 7, 15, 31, 31, 31]

YonghaoHe commented 5 years ago

@dongfangduoshou123 我给你一个CSDN的博客吧,你详细看看. link

dongfangduoshou123 commented 5 years ago

恩，这个博客中公式跟我上面提到的那个网址里关于第一个感受野中心的计算公式一致。这个博客中start0 也是取的0.5。。。。。。。但是这个博客中给的那个例子太极端，conv1和conv2的卷积核参数都是k=3, p=1, s=2, 导致stride都没有发挥作用，导致公式的第二项一直为0,结果start1从头到未都没有变化，一直是0.5.

因此我觉得，更精确的话，是不是应该是：param_receptive_field_center_start = [3.5, 3.5, 7.5, 7.5, 15.5, 31.5, 31.5, 31.5]

YonghaoHe commented 5 years ago

@dongfangduoshou123 准确的就是我代码里面的,多个0.5没有必要,这个你直接纸上画一下就明白的,非常确定.

ckqsars commented 5 years ago

在这个基础上多问一句，你这个的设置 param_receptive_field_list = param_bbox_large_list 我是不是可以理解为其实，是使用了anchor的呢，anchor的大小其实就是定义在那一层的需要检测的最大人脸

YonghaoHe commented 5 years ago

@ckqsars 这个理解有一些问题。网络的每一个分支负责的是一个范围内的所有人脸，它包含的是一个连续的尺度。而anchor-based的方法其实只能囊括离散的尺度（通过IOU来决定）。事实上，我们的anchor-free策略，如果从anchor-based角度理解，它囊括了非常多的anchor而已。其实anchor-based策略只是我们策略的一个子集。后续我们会证明这一点，anchor-based是没有必要的。

ckqsars commented 5 years ago

懂你意思了，还有一个问题，我觉得你们在文中提及的，不同大小的人脸在检测的时候，其有效感受野需要框定的范围是不一样的，这个思路很有意思，只是，在定义每个感受野所对应的检测人脸大小的范围，这个是怎么确定的呢？

YonghaoHe commented 5 years ago

@ckqsars 目前感受野对应的人脸检测范围主要是通过经验进行设定的。有效感受野是一个定性的理解，但是目前没有人研究过如何定量去计算有效感受野（这个和网络结构具体挂钩，其实是一个有意思的研究方向）。下一个版本我们将尝试从一个角度来定义某个尺度范围的人脸需要多大的感受野，从而让网络结构可以按照需求自动生成。目前还不方便说太多细节。

ckqsars commented 5 years ago

谢谢，期待你们下一个版本~