Closed fseasy closed 4 years ago
It depends on the original dataset used. For example, the pointer generator paper you are referring to uses the uncased version of cnn-dailymail. Cased is more realistic, but uncased can reduce the vocabulary size.
Sorry, but may be you missed my key points. the PGN use the uncased CNNDM, while the BART use the cased CNNDM. So I'm wondering why use Cased? and how it maters? because BART is merged to the fairseq, so I have to asked here.
❓ How Cased/Uncased processing matter for Summarization Task?
Hi, I noticed the fairseq use Cased for summarization. What's more, UniLM also use Cased data. While many previous SOTA works (like PointerGenerator, BertSumAbs and so on) use uncased instead.
So I'm curious why use cased instead of uncased? Is it enhance the performance notably, or just for better display?