ChenxinAn-fdu / CoLo

[COLING'22] Code for our paper: "COLO: A Contrastive Learning based Re-ranking Framework for One-Stage Summarization"
22 stars 1 forks source link

question about data process #3

Open hustcxx opened 2 years ago

hustcxx commented 2 years ago

The paper describes using the [doc] token to generate a representation of the document, but I don't see where to add the [doc] token in the data processing file ext_lable_and_tokenize.py ?

ChenxinAn-fdu commented 2 years ago

We used the \<s> token (for bart) to init the [doc] token in our code.

hustcxx commented 1 year ago

Thanks for your response. Another question. I want to know how did you process the PudMed dataset, like Match???

ChenxinAn-fdu commented 1 year ago

yes! we tokenize the dataset and keep the max_input_length to 512.