alasdairtran / transform-and-tell

[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning
https://transform-and-tell.ml/
89 stars 14 forks source link

how LSTM+Glove+IA encode articles? #2

Open HAWLYQ opened 4 years ago

HAWLYQ commented 4 years ago

@alasdairtran Hi, I have read your newly published paper. I'm curious about how LSTM+Glove+IA encodes articles? Does it encode each article at the sentence level or the word level?

alasdairtran commented 4 years ago

I encoded each article at the word level. It's simply the average of the glove embeddings of the words in the article.

HAWLYQ commented 4 years ago

I encoded each article at the word level. It's simply the average of the glove embeddings of the words in the article.

so there is only an article vector for the article and there is no sentence-level attention (proposed in goodnews paper) in it, right?

alasdairtran commented 4 years ago

Yep! Just one article vector and no attention. And we were able to (slightly) beat the ones reported Goodnews.

HAWLYQ commented 4 years ago

Yep! Just one article vector and no attention. And we were able to (slightly) beat the ones reported Goodnews.

Thanks! Wonderful work! And I have one more question: Are the embeddings for byte-pair tokens randomly initialized during decoding?

alasdairtran commented 4 years ago

Yes in the decoder, the embeddings of the byte-pair tokens are initialised from a gaussian with mean 0 and variance 1/embed_size

HAWLYQ commented 4 years ago

Thanks!

yuan93427 commented 4 years ago

@alasdairtran Thanks for your work! I want to generate captions on the NYTimes800k test set , but it take me a long time to train. Can you provide the weights best.th?

alasdairtran commented 4 years ago

Send me an email (see README file) to request the full data and model weights.