ajamjoom / Image-Captions

BERT + Image Captioning
129 stars 30 forks source link

Issue with BERT implementation #3

Open ajfisch opened 4 years ago

ajfisch commented 4 years ago

Hi,

It seems that you're trying to decode auto-regressively using BERT representations as a drop-in replacement for word embeddings. But BERT is bi-directional; the representation at token i has information about all tokens j > i. So, your model already knows what it needs to predict, before it predicts it.

In order for this to be correct you need to mask attention to all tokens j > i, which I don't think you do currently.

leyuan commented 4 years ago

@ajfisch I think you are right, by any chance you have fixed the issue?

enes3774 commented 1 year ago

@ajfisch I think you are right, by any chance you have fixed the issue? I think you have to use autoregressive models such as gpt2. I think https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning is a high enough model for image captioning.