Jianlong-Fu / Recurrent-Attention-CNN

225 stars 45 forks source link

Vanishing gradient issue in APN #13

Open pjj4288 opened 6 years ago

pjj4288 commented 6 years ago

I am trying to re-implement this experiment in pytorch. However, weights of APN(Attention Proposal Network) aren't updated because of extremely low gradients. I think this issue is from logistic function of eq(5). It looks like a flat region of logistic function makes gradients almost zero.

In the paper, authors pretrained APN using last cnn features. Did you record the performance without this initialization?

Thank you.

Ostnie commented 6 years ago

@pjj4288 Could you please tell me how to create APN? I don't know what loss and clipping should be .