Open thangld201 opened 2 years ago
@Wendy-Xiao Could you help me clarify this ?
Hi Tang,
It's a good point, we do not have any particular strategy to address the gap between pre-training and the downstream task. As it has been noticed, there are still some problems with our model in the zero-shot setting, e.g. the length of generated summaries can not really be controlled except for setting a hard stop with a certain length.
In our initial explorations, we tried some methods to address the gap, e.g. adding multiple
Few-shot finetuning could address the problem to some extend, so we will suggest the best way to use PRIMERA is in a few-shot manner, i.e. finetuning the model on few examples, so that it can learn the real task without
As far as I'm aware, PRIMERA replaces sentences with \<SENT-MASK> tokens during pre-training. However, these \<SENT-MASK> tokens do not appear on fine-tuning phase, or when doing inference. Still, it was shown to achieve impressive results on Zero-Shot and Few-Shot Evaluation. I was wondering if PRIMERA had any strategies to reduce the mismatch between fine-tuning and pre-training phase ? (I did not see related information mentioned in the paper)