Open andrawaag opened 1 year ago
I started a repo with some plans in it here:
https://github.com/INCATools/environments2wikidata
there are so many terms, manual curation will be hard. But we can use ontology axioms to aid in the disambiguating...
lots of old code, I will try and update...
Today I tried to add as many GAZ identifiers to Wikidata on Suriname as possible (see: https://w.wiki/6CVW).
This was basically mainly a manual curation step, where I search for the names in Wikidata and added the respective GAZ identifiers.
For editing the GAZ: Make a pull request to: edit the GAZ_countries.owl file. As the full GAZ is quite large, we are no longer editing that file.
To edit: First I would check the gaz.owl file that the locations you want are not already in the file. I would recommend using a new ID space:
GAZ:$sequence(8,33333333,44444444) As this will not conflict with already used name spaces
Cheers, Lynn
GAZ does seem to have many mappings to external identifiers (if at all). This makes aligning Wikidata particularly challenging.
To get all terms in GAZ covered in Wikidata we would probably need to apply different strategies to see if a term is already is covered or not.
In the case where the label used in Wikidata exactly matches the term in GAZ, Open refine, can be our friend. I used this tool - offered in for example PAWS - to align GAZ countries with Wikidata.
However, I continued with terms on Suriname in GAZ. So far all terms do exist in Wikidata but most with a different spelling variation. I will try to add all GAZ terms for that country, manually.
So so far two strategies have been applied: