Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
MIT License
737 stars 90 forks source link

Text simplification dataset #85

Open MeshchaninovViacheslav opened 5 months ago

MeshchaninovViacheslav commented 5 months ago

Hi, thanks for your great work. I would like to train your model on wiki-auto for text simplification task. I've noticed that original wiki-auto dataset has less than 677k sequences. I haven't found detailed instruction in your article. So could you share how you get / preprocess wiki-auto?

summmeer commented 3 months ago

Hi, I downloaded the data from their repo. The link you provided may be their enhanced and filtered version. If you prefer the original, you can access it via our Google Drive link.