allenai / PRIMER

The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
Apache License 2.0
153 stars 32 forks source link

Mismatch between pre-training and fine-tuning phase #14

Open thangld201 opened 2 years ago

thangld201 commented 2 years ago

As far as I'm aware, PRIMERA replaces sentences with \<SENT-MASK> tokens during pre-training. However, these \<SENT-MASK> tokens do not appear on fine-tuning phase, or when doing inference. Still, it was shown to achieve impressive results on Zero-Shot and Few-Shot Evaluation. I was wondering if PRIMERA had any strategies to reduce the mismatch between fine-tuning and pre-training phase ? (I did not see related information mentioned in the paper)

thangld201 commented 2 years ago

@Wendy-Xiao Could you help me clarify this ?

Wendy-Xiao commented 2 years ago

Hi Tang,

It's a good point, we do not have any particular strategy to address the gap between pre-training and the downstream task. As it has been noticed, there are still some problems with our model in the zero-shot setting, e.g. the length of generated summaries can not really be controlled except for setting a hard stop with a certain length.

In our initial explorations, we tried some methods to address the gap, e.g. adding multiple tokens as prefix of the input document. The interesting thing we found was that the more we added, the longer summary it tended to generate.

Few-shot finetuning could address the problem to some extend, so we will suggest the best way to use PRIMERA is in a few-shot manner, i.e. finetuning the model on few examples, so that it can learn the real task without .