attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.69k stars 959 forks source link

bug fix in OutputSplitter regarding file handling for bz2 type #333

Open DurgaiVS opened 1 month ago

DurgaiVS commented 1 month ago

Previously the bz2 file handling was done in an unsafe manner, I met with some exceptions when trying to write it as a bz2 file. This commit only handles issues in the class OutputSplitter from the file WikiExtractor.py.