Question about preprocessing MuitiNews dataset

allenai / PRIMER

The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Apache License 2.0

150 stars 31 forks source link

Open xjw-star opened 1 year ago

xjw-star commented 1 year ago

In the achieved Dataset class, each document is split by '|||||'. But notice that the last part is ignored. I really want to know the reason.