A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contains ~43 million edits across 8 languages.
106
stars
8
forks
source link
English portion of the dataset seems to be corrupted #1
Could you please re download and confirm if the files are looking fine. I have fixed this for English and proceeding to other languages. Thanks for reporting the problem!
Hi,
I tried to unzip the English portion of the dataset:
And it gave an error:
The data files for other languages (e.g., German) seem to be fine.