for Summarization, How Cased/Uncased processing matter ?

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.44k stars 6.4k forks source link

for Summarization, How Cased/Uncased processing matter ? #2007

Closed fseasy closed 4 years ago

fseasy commented 4 years ago

❓ How Cased/Uncased processing matter for Summarization Task?

Hi, I noticed the fairseq use Cased for summarization. What's more, UniLM also use Cased data. While many previous SOTA works (like PointerGenerator, BertSumAbs and so on) use uncased instead.

So I'm curious why use cased instead of uncased? Is it enhance the performance notably, or just for better display?

huihuifan commented 4 years ago

It depends on the original dataset used. For example, the pointer generator paper you are referring to uses the uncased version of cnn-dailymail. Cased is more realistic, but uncased can reduce the vocabulary size.

fseasy commented 4 years ago

Sorry, but may be you missed my key points. the PGN use the uncased CNNDM, while the BART use the cased CNNDM. So I'm wondering why use Cased? and how it maters? because BART is merged to the fairseq, so I have to asked here.