Yuanhy1997 / SeqDiffuSeq

Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation [NAACL 2024]
https://arxiv.org/abs/2212.10325
86 stars 14 forks source link

Text simplification dataset #27

Open MeshchaninovViacheslav opened 4 months ago

MeshchaninovViacheslav commented 4 months ago

Hi, thanks for your great work. I would like to train your model on wiki-auto for text simplification task. I have found your data used in the google drive link provided from this repo. I've noticed that original wiki-auto dataset has less than 677k sequences. I haven't found detailed instruction in your article. So could you share how you get this data?

Yuanhy1997 commented 4 months ago

I think I followed the preprocessing from DiffuSeq for fair comparison. Please refer to their repo for the processed dataset.