commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
123 stars 24 forks source link

Import wikidata dumps #10

Closed sylvinus closed 8 years ago

sylvinus commented 8 years ago

https://www.wikidata.org/wiki/Wikidata:Database_download

Importing wikidata would be (for starters) a good way to associate a lot of official URLs to their named entity and wikipedia page. This would help commonsearch/cosr-results#4 for instance.

The way we import Alexa should be a good starting point.

For a first version I think it should be ok to store (key, value) in rocksdb as (normalized_url, (name, english description, english wikipedia slug)).

Wikidata should then be added to https://about.commonsearch.org/data-sources

sylvinus commented 8 years ago

Done in #26, will open further issues for each of the places we could use this data!