ecologylab / BigSemanticsWrapperRepository

Repository of wrappers used by the BigSemantics project.
Apache License 2.0
3 stars 9 forks source link

Wikipedia Wrapper Not Resolving on Wikipedia URLs #49

Closed rhema closed 8 years ago

rhema commented 8 years ago

For wikipedia URLs, e.g. (https://en.wikipedia.org/wiki/Abraham_Lincoln), the Wikipedia Wrapper resolves to "rich_document", instead of the more specific Wikipedia wrapper. This is may be due to a problem in the selector.

keithkade commented 8 years ago

fixed. had a couple issues. one was that ParsedURL wasn't stripping 'en.' from the domain so that prevented matching. the other was that the wikipedia_page_type had other_tags = "wikipedia_page", which was overriding the actual wikipedia_page selector. I just deleted the other_tags field because to mine and matthew's knowledge it wasn't used for anything