AddressForAll / LIXO-digital-preservation

Digital preservation, main sources and hub project
Apache License 2.0
0 stars 0 forks source link

Generating and validating jurisdiction table, automation #1

Open ppKrauss opened 3 years ago

ppKrauss commented 3 years ago

The jurisdiction.csv must be checked and generated by software.

  1. The columns name, iso2_id, name_orig, wikidata_id and osm_id can be generated (as first draft) by Wikidata.
  2. The columns preserved and coded must be edited by user, one by one.
  3. The "official source" for name and iso2_id is https://github.com/datasets/country-list or the (ugly) ISO page. There are also a complete country-codes dataset and alternative UNECE's country-codes "official copy".
  4. You can confirm OSM IDs with direct Planet SQL query or OSM's Overpass-turbo, by ISO2 and Wikidata tags.

# Countries and its codes
SELECT ?code  ?item ?itemLabel
               (MAX(?osmId) as ?osmId_max) (COUNT(?code) as ?osmId_n)  
WHERE
{
  ?item wdt:P297 ?code.
  OPTIONAL{?item wdt:P402 ?osmId .}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?code  ?item ?itemLabel
ORDER BY ?code

Try at Wikidata Query Service. Check the API for automatic download of Wikidata-query CSV file. All other can be downloaded directelly.

Assert (Wikidata vs other sources) with a diff after reference generation in the first comparison.

ppKrauss commented 3 years ago

Confusion over official name in the official language

Remember that "official language", when exists only one, is P37: it must be selected from the set of country's demonym, that is P1549... But now Wikidata also offer P1705 the native label.. or P1448 official name. Jamaica example: native label is unique, official name has 2...

Results: consulta com nome nativo incluso... remember errors


New query with telephone CCC and ISO-number.