Open mojesty opened 6 years ago
Yeah. I encountered the same problem. I find that the self.attentions
is only defined in the train
mode, so I also add self.attentions
in the test
mode. However, the difference between attention maps is still negligible.
May this occur because of some mistake that makes attention shared across all timestamps? 1e-9 difference can be explained them. If so, how can it be fixed?
@JiayunLi I think self.attentions is only used to be shown.
@mojesty I haven't figured it out.
@RoronoaZA Yeah. Are you able to get reasonable attention maps for evaluation images?
Same problem
Hi @mojesty , how long does it take to train one epoch? What machines did you use? Thank you!
Has anyone solve the problem? Anyone can help me? Thank you all.
The attention layer defined in model.py is: image context (49,2048) ->fc1 -> hidden vector a (49,) word vector (num of lstm unit) ->fc2-> hideen vector b (49,) attention = softmax(hidden vector a + hidden vector b)
I don't think the attention could work, because:
So in the end, fc2 will generate very small hidden vector b, and attention is mainly depend on the hidden vector a. That's why you observe attention maps are pretty similar across token.
Hello @DeepRNN ! I took a look at attentions that model generates in test mode. I did the following: in
base_model.py:200
i changed the code as followingSo after that, every
attentions
array has the shapebatch_size, 196, beam_size
and for simplicity I setbeam_size=1
when testing. Next, I simply stack allattentions
in one numpy array and visualize its content. I found two concerns:1.jpg
the maximum difference between 1st and 2nd tokens is~e^-9
).However, the caption of the image is both grammatically and semantically correct. I would like to discuss these results.