Closed Lufei-github closed 4 years ago
Our assumption is that the intensity of the learned attention map corresponds to the importance of each pixel in the consistency learning. Ideally, with a suitable attention threshold $\tau_{att}$, we can filter out pixels with small domain gaps and focus more on the noteworthy regions. Figure 4 in the paper provides an intuitional visualization about the learned attention maps. It can be observed that the road category usually attracts more attention compared to the sky regions. This phenomenon is identical to the detailed per-class accuracy in Table 2, where the IoU metric of the road and sky is 31.9 and 58.9 with NoAdapt method, respectively, while it increases to 85.2 and 74.9 after the adaptation.
Since there is no specific consideration on different categories in this framework, the current attention learning may also filter out regions with large domain gaps for some categories by mistake.
Thank you for your quickly reply and such a detailed explanation! It really solved some of my confusion!
You says "different regions in the images usually correspond to different levels of domain gap",I agree with you definitely!
Then you says "we introduce the attention mechanism into the proposed framework to generate attention-aware features".
After reading your paper and the code. I know you design an attention module in the segmentation network, and you use avgpool, UpsamplingBilinear2d, interpolation, aconv, sigmoid to build this module and get a mask, if the mask bigger than threshold 0.3(for example),then got 1,otherwise got 0.Then you obtain a M, You multiply M and consistency loss to selectively calculate the consistency loss.
But now I have a small question about the attention module. Why mask bigger than 0.3, then we focus on this pixel, and smaller than 0.3, then we ignore this pixel? Why your attention module can focus larger levels of domain gap and ignore smaller levels of domain gap? How do you make sure M can filter out the insignificant pixel?