Open p-mohan opened 2 years ago
It took 34 days to create a local usable copy of the Wikidata database ( 4 thread 32 GB instance) from the compressed public download of the wiki dump (around 100GB download). The uncompressed database is 790 GB in size. The next task will be to run the above sparql query statement to extract the city data.
The objective of this task is to collect Country, State, Cities/Towns data from a local Wikidata dump. Here we are using a local instance of Wikidata due to the query timeout of the public Wikidata service.
The following sparql query statement is giving expected results.
This sample query limits the results to country Q408 (Australia). Due to somewhat non-trivial hierarchy of Wikidata, to remove entities such as buildings I am using a filter to see if a city has a population property. It also filters out locations that have "ended".