allenai / PRIMER

The official code for PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
Apache License 2.0
150 stars 31 forks source link

Question about preprocessing MuitiNews dataset #20

Open xjw-star opened 1 year ago

xjw-star commented 1 year ago

In the achieved Dataset class, each document is split by '|||||'. But notice that the last part is ignored. I really want to know the reason.