fastai / course-nlp

A Code-First Introduction to NLP course
https://www.fast.ai/2019/07/08/fastai-nlp/
3.42k stars 1.48k forks source link

Lesson 10 notebooks: `bunzip` throws an error when unzipping `.bz2` files #39

Open jcatanza opened 4 years ago

jcatanza commented 4 years ago

On a Windows 10 64-bit machine:

bunzip throws "EOFError: Compressed file ended before the end-of-stream marker was reached" when processing these files: viwiki-latest-pages-articles.xml.bz2I trwiki-latest-pages-articles.xml.bz2

Attaching a screenshot:

bunzip_error

Windows version of 7-zip throws a similar error

Note 1: A valid .xml format file is still saved.

Note 2: The problem was resolved when I downloaded the files directly from https://archive.org/details/wikipediadumps

alirezadigi commented 2 years ago

somehow same error :(