Schwittleymani / ECO

Electronic Chaos Oracle
https://schwittlick.net/eco
Apache License 2.0
6 stars 1 forks source link

Test Sample Pattern.en Wikipedia crawler #99

Closed transfluxus closed 7 years ago

transfluxus commented 7 years ago

good grab for nick bostrom e.g. https://en.wikipedia.org/wiki/Nick_Bostrom

schwittlick commented 7 years ago

this one here is does it all: http://www.clips.ua.ac.be/pages/pattern-web#wikipedia

Wikipedia.search() returns a single WikipediaArticle for the given (case-sensitive) query, which is the title of an article. Wikipedia.index() returns an iterator over all article titles on Wikipedia. The language parameter of the Wikipedia()defines the language of the returned articles (by default it is "en", which corresponds to en.wikipedia.org).

article = WikipediaArticle(title='', source='', links=[])
article.source              # Article HTML source.
article.string              # Article plaintext unicode string.
article.title               # Article title.
article.sections            # Article sections.
article.links               # List of titles of linked articles.
article.external            # List of external links.
article.categories          # List of categories.
article.media               # List of linked media (images, sounds, ...)
article.languages           # Dictionary of (language, article)-items.
article.language            # Article language (i.e., 'en').
article.disambiguation      # True if it is a disambiguation page
schwittlick commented 7 years ago

continued here: #175