Question about the "rare proper nouns" in the paper.

alasdairtran / transform-and-tell

[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning

91 stars 14 forks source link

how do the "rare proper nouns" extracted?

We first use spacy to extract proper nouns. You can see the actual get_proper_nouns function we use here. Then, we define "rare proper noun" to be proper nouns that appear in a test caption but not in any training caption (note that we only look at captions and not actual article content).

Is there any difference between the "rare proper nouns" and "named entities" except that the former is "rare"?

You can check out our get_entities function here. We use the NER from spacy. I believe that proper nouns and named entities are similar but not completely overlapping concepts. For example, $1 billion is a named entity (MONEY) but not a proper noun.

The "rare proper nouns" do not appear in any training caption, but are they possible to exist in training or testing news articles?

Yes. Since we only process the captions to do the classification, it is possible that a rare proper noun is not present in any training caption but might have appeared inside a training article context.

alasdairtran / transform-and-tell

Question about the "rare proper nouns" in the paper. #44