Closed rhema closed 8 years ago
fixed. had a couple issues. one was that ParsedURL
wasn't stripping 'en.' from the domain so that prevented matching. the other was that the wikipedia_page_type
had other_tags = "wikipedia_page"
, which was overriding the actual wikipedia_page
selector. I just deleted the other_tags
field because to mine and matthew's knowledge it wasn't used for anything
For wikipedia URLs, e.g. (https://en.wikipedia.org/wiki/Abraham_Lincoln), the Wikipedia Wrapper resolves to "rich_document", instead of the more specific Wikipedia wrapper. This is may be due to a problem in the selector.