Open guyuchao opened 5 years ago
@guyuchao Hi. The details can be seen in Figure 2 of this paper. It applys the subsample operation to reduce computation. You can also add maxpool to self.theta, but you need to upsample the feature map in the end (because identity connection).
In my opinion, if you add maxpool to self.theta, then 50% information will be lost. And finally 50% points of input x can not get the accurate non-local relation feature, as they didn't attend in non-local relation calculation.
When setting subsample to True,why self.theta don't add a maxpool layer?