attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.76k stars 968 forks source link

Add feature to extractPage to also dump the extracted page to json/csv/txt #317

Open BwandoWando opened 1 year ago

BwandoWando commented 1 year ago

Thank you for your tool, I've tried using it as seen below

image

Can you please add a feature in the extractPage py to also have the capability to dump the extracted text into a json/csv/txt, and even have the option to remove all markups and html and just get the text of the page?

Also, is it possible to submit multiple Ids at once rather than one by one? Thank you and more power!