ChenxinAn-fdu / CoLo

[COLING'22] Code for our paper: "COLO: A Contrastive Learning based Re-ranking Framework for One-Stage Summarization"
22 stars 1 forks source link

Preprocess script preprocess/ext_label_and_tokenize.py returns different results from processed CNN/DailyMail #5

Open HungVS opened 1 year ago

HungVS commented 1 year ago

Thanks for your great work!

After executing preprocess/ext_label_and_tokenize.py on raw CNN/DailyMail dataset using the following command:

python ext_lable_and_tokenize.py --raw_path [SOME PATH]/CoLo/extractive/datasets/raw_CNNDM --save_path [SOME PATH]/CoLo/extractive/datasets/preprocesssed_CNNDM --max_src_ntokens 512

The results are different from the processed CNN/DailyMail. For example, in article_id 34:

Both raw and processed CNN/DailyMail are from your provided links on this repo.

Can you suggest what I've done wrong?

Thanks in advanced!