CalayZhou / MBNet

Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems (ECCV 2020)
103 stars 30 forks source link

some questions #59

Open yimoo4J opened 1 year ago

yimoo4J commented 1 year ago

Thank you very much for your research work. I consider this paper to be well worth a intensive reading. Due to my limited level, I encountered some confusion during the reading process. image

  1. Could you provide an explanation of the underlying assumption behind using global average pooling for this purpose.
  2. "For the thermal modality, FD =FR −FT. For the RGB modality, FD = FT −FR" Why it was designed this way?

thanks again!

CalayZhou commented 1 year ago

thanks for your attention! Global average pooling converts the feature map into global feature representation so as to apply attention at the channel level. And "For the thermal modality, FD =FR −FT. For the RGB modality, FD = FT −FR" is designed for the symmetry of DMAF module, since DMAF module is the "bridge" which connects the RGB feature extractor and thermal feature extractor, it should play a balance the effect on both modalities.

cooooolg commented 1 year ago

I am likewise studying your article. If the difference is too huge, and the negative FD value may be large, the value after the sigmoid activation function will be zero. This is especially true in the case of high spikes in the feature map. Will this result in the additional information on the mode always being zero?

CalayZhou commented 1 year ago

Hello, the activation function used in DMAF module is tanh, when the negative FD value is large, the output of tanh will be -1.