Open vadimkantorov opened 1 year ago
here is my attempt to do this with WikiData/SPARQL directly:
SELECT DISTINCT ?city ?cityLabel ?countryLabel ?iso ?population ?gps
WHERE {
?city wdt:P31/wdt:P279* wd:Q515 .
?city wdt:P17 ?country .
?city wdt:P1082 ?population .
?city wdt:P625 ?gps .
?country wdt:P297 ?iso .
FILTER (?population > 100000) .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)
LIMIT 5000
Thanks for your input @vadimkantorov
I like the idea. The data of wikidata could also be interesting for other use cases (e.g. better linking, etc.).
Personally, I do not have any experience with wikidata. Contributions would be very welcome 🙂
When it comes to ranking, I would recommend QRank (QRank is ranking Wikidata entities by aggregating page views on Wikipedia https://qrank.wmcloud.org/) and OSMViews (Ranking geo locations based on OpenStreetMap views https://osmviews.toolforge.org/).
Population, even not up-to-date, or even order of magnitude - probably can be joined to tsv from wikidata, would be useful for filtering the data file that you provide. place_rank or importance probable can be substitutes, but population would be more easily interpretable.
Another field useful for filtering would be some sort of macro region: Western Europe / Asia etc.
My usecase: filter cities by population and build a compact trie for putting the basic geocoder directly into client-side javascript for use with leaflet/osm (e.g. all cities in the world larger than 100k people - should not be uncompressibly many)
In the meanwhile, I'm going to try to do the reverse: find the cities via wikidata and then filter from osmnames dataset by wikidata id