attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.69k stars 959 forks source link

Issues on newer (2023) and older (2019) dumps #304

Closed JohnTailor closed 1 year ago

JohnTailor commented 1 year ago

Wikextractor failed on new dumps (bz2) issues and older dumps (would only extract 4 GB of texts). Is this a known issue? Also I cannot find the exact asked for dump from 2020. Any link where to find it?