attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.69k stars 959 forks source link

Patch support for Windows #315

Open rgryta opened 1 year ago

rgryta commented 1 year ago

Also fixing regex issue with newer python versions

rgryta commented 1 year ago

Temporary patch that just enables use with Windows - otherwise it doesn't work at all. Switch to multithreading heavily impacts the outcome, but proper support will be more complicated.