XiaLiPKU / EMANet

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)
https://xialipku.github.io/publication/expectation-maximization-attention-networks-for-semantic-segmentation/
GNU General Public License v3.0
680 stars 130 forks source link

why does EMANet suffer from vanishing /exploding gradient even though T_train (=3) is small? #36

Closed ghost closed 4 years ago

ghost commented 4 years ago

Hello,

I would like to ask the authors why does EMAnet suffer from the vanishing / exploding gradient inherent in RNNs even though the EM iterations are unrolled only for a small number (in this case 3) of steps? Vanilla RNNs with with tanh non-linearities can typically work on sequences on the order of 100 time steps, and LSTMs can work on sequences on the order of 1000 time steps.

Since the mIOU peaks at a very small value of T_train, is vanishing / exploding gradients really the reason that the mIOU deteriorates for higher values of T_train (>3)? Have the authors by any chance printed the gradient norms of every layer to check for vanishing or exploding gradients?

Thank you in advance.