some questions about HKS protocol

FishAndWasabi / YOLO-MS

YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-Time Object Detection

Other

222 stars 23 forks source link

some questions about HKS protocol #2

Closed yang-0201 closed 11 months ago

yang-0201 commented 1 year ago

Have you done any comparative experiments using HKS protocol in YOLOMSPAFPN and whether you can also improve performance?

FishAndWasabi commented 1 year ago

We have yet done experiments using the HKS protocol in Neck, but we have included it in our future plans. We will conduct the experiments and share the results ASAP.

Thanks for your interest in our work!

Best Wishes! 😊

yang-0201 commented 1 year ago

Thanks very much, I've been working on something related to this recently and try to decide whether or not to use/cite the HKS mechanism in my article and I have another question for you. In your previous experiments, have you found that a backbone network with a larger sense field (larger convolutional kernel) is beneficial in improving the performance of the network's small targets?

FishAndWasabi commented 1 year ago

Our experiments have demonstrated that using larger convolutional kernels negatively impacts the performance of detecting small targets to a certain degree. We think the reason is that the large receptive field introduced by a large kernel may introduce contaminative information outside the small targets. This is one of the motivations for us to propose HKS. We hope this helps.

Best Wishes! 😊

yang-0201 commented 1 year ago

About fig.4, and how to calculate Effective Receptive Field for the whole network？

FishAndWasabi commented 1 year ago

We follow the RepLKNet[1] to calculate the Effective Receptive Field through the aggregated contribution score matrix, denoted as $\mathcal{A} \in \mathcal{R}^{H \times W}$. You can refer to the office code for more details to calculate $\mathcal{A}$.

Initially, we calculate the $\mathcal{A}$ for stage 2, stage 3, and stage 4 in the encoder. Subsequently, the $\mathcal{A}$ of each stage is normalized to [0, 1]. Then, we employ a score threshold of 0.5 and select pixels with scores exceeding this threshold to identify the high-contribution area. To intuitively show the difference between each model, we visualize the side length of the minimum rectangle that covers the high-contribution area as our metric for measuring the ERF.

Best Wishes! 😊

[1] Ding, et al. Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs. CVPR 2022.

wsy-yjys commented 9 months ago

We follow the RepLKNet[1] to calculate the Effective Receptive Field through the aggregated contribution score matrix, denoted as A∈RH×W. You can refer to the office code for more details to calculate A.

Initially, we calculate the A for stage 2, stage 3, and stage 4 in the encoder. Subsequently, the A of each stage is normalized to [0, 1]. Then, we employ a score threshold of 0.5 and select pixels with scores exceeding this threshold to identify the high-contribution area. To intuitively show the difference between each model, we visualize the side length of the minimum rectangle that covers the high-contribution area as our metric for measuring the ERF.

Best Wishes! 😊

[1] Ding, et al. Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs. CVPR 2022.

Hi，so with the threshold increasing，the pixels with scores exceeding this threshold are decrease，so the ERF is decrease？

wsy-yjys commented 9 months ago

Initially, we calculate the A for stage 2, stage 3, and stage 4 in the encoder. Subsequently, the A of each stage is normalized to [0, 1]. Then, we employ a score threshold of 0.5 and select pixels with scores exceeding this threshold to identify the high-contribution area. To intuitively show the difference between each model, we visualize the side length of the minimum rectangle that covers the high-contribution area as our metric for measuring the ERF.

why employ a score threshold of 0.5？I'm struggling with which threshold to use to represent the exact receptive field of the model, I would appreciate it if you could give me a hand~