bitextor / warc2text

Extracts plain text, language identification and more metadata from WARC records
MIT License
20 stars 5 forks source link

Multiple improvements and bug fixes #6

Closed zuny26 closed 3 years ago

zuny26 commented 3 years ago