Open cgendreau opened 11 years ago
Have you had a look at GeoNames ? Lots of Semantic Web goodness if that's your thing, see http://www.geonames.org/ontology/documentation.html
As sources for names and synonyms, there are also The Getty Thesaurus of Geographic Names (http://www.getty.edu/vow/TGNSearchPage.jsp), and GADM (http://www.gadm.org/).
For misspellings, I have accumulated nearly 5000 variants on values mapped to the Darwin Core term country and have provided the corresponding ISO 3166-2 country code for all of the ones for which that is possible. This list is growing as we pass additional data through validation for VertNet.
Just stumbled upon this tool: http://okfnlabs.org/blog/2013/05/16/nomenklatura-matching-service-reconciliation-made-easy.html Might be of help here.
I think it is worth mentioning : http://community.gbif.org/pg/file/read/34059/
Would be interesting to expand the narwhal to be able to build an up-to-date and well-maintained knowledge base of country names, their alternative representations (possibly multilingual) and mappings to known misspellings using linked open data (semantic Web).
This could be done using a semantic Web URI. Something like : http://dbpedia.org/page/Category:Member_states_of_the_United_Nations
A country could than be identified with a URI such as http://dbpedia.org/resource/Canada The name of a country in different languages could populated using "owl:sameAs". The known misspellings could be handle using SKOS.
For performance reasons, we'd like this thesaurus to be embedded in the library, but with the capacity to be periodically refreshed with data pulled from external resources (like it's currently the case through the gbif-parser).
Benefits: