I am trying to re-implement this experiment in pytorch.
However, weights of APN(Attention Proposal Network) aren't updated because of extremely low gradients.
I think this issue is from logistic function of eq(5). It looks like a flat region of logistic function makes gradients almost zero.
In the paper, authors pretrained APN using last cnn features. Did you record the performance without this initialization?
I am trying to re-implement this experiment in pytorch. However, weights of APN(Attention Proposal Network) aren't updated because of extremely low gradients. I think this issue is from logistic function of eq(5). It looks like a flat region of logistic function makes gradients almost zero.
In the paper, authors pretrained APN using last cnn features. Did you record the performance without this initialization?
Thank you.