attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.69k stars 959 forks source link

Add feature to extractPage to also dump the extracted page to json/csv/txt #317

Open BwandoWando opened 10 months ago

BwandoWando commented 10 months ago

Thank you for your tool, I've tried using it as seen below

image

Can you please add a feature in the extractPage py to also have the capability to dump the extracted text into a json/csv/txt, and even have the option to remove all markups and html and just get the text of the page?

Also, is it possible to submit multiple Ids at once rather than one by one? Thank you and more power!