attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.74k stars 965 forks source link

How to extract lists pages? #302

Open katzurik opened 1 year ago

katzurik commented 1 year ago

There are many pages which are just list other pages, e1 , e2

Running "vanilla" extraction omits them entirely while keeping only the title. What do I need to configure to extract those pages? bonus - what can I do in order to extract - only - those list pages?