OnroerendErfgoed / skosprovider_getty

Skosprovider implementation of the Getty Vocabularies
http://skosprovider-getty.readthedocs.org/
MIT License
6 stars 1 forks source link

Output should be unicode #6

Closed koenedaele closed 10 years ago

koenedaele commented 10 years ago

The skosprovider should make sure that all output is already unicode. That probably means adding stuff like .decode('utf-8') where appropriate. We might need to be careful with possible asian characters. Not sure how all of this will work. See examples/churches.py for an example containing non-ascii characters.

cahytinne commented 10 years ago

The Getty vocabulary returns values encoded in UTF-8.

RDF files:

<?xml version="1.0" encoding="UTF-8"?>

SparQL requests:

res = requests.get(self.base_url + "sparql.json", params={"query": query})
res.encoding = 'utf-8'

Using the function RDFLiteral.toPython() will return the values in unicode UTF-8.

The decode('utf-8') in the examples/churches.py will raise errors:

If the decode('utf-8') in the examples/churches.py is removed, there will be no errors. The label and note types are unicode types and the output will be printed correctly:

Labels
------
en: churches (buildings) [prefLabel]
es: iglesias [prefLabel]
de: kirchen (Gebäude) [prefLabel]
nl: kerken [prefLabel]
en: church (building) [altLabel]
es: iglesia [altLabel]
en: cirice [altLabel]
en: kirks (buildings) [altLabel]
en: kirrke [altLabel]
en: church buildings [altLabel]
en: churche (building) [altLabel]
de: kirche (Gebäude) [altLabel]
de: kirchenbau [altLabel]
de: Kirchengebäude [altLabel]
en: kirkes [altLabel]
de: kirchenbauten [altLabel]
en: kurks (buildings) [altLabel]
nl: kerk [altLabel]
en: chirche [altLabel]
en: chiriche [altLabel]
en: church architecture [altLabel]
en: chureche [altLabel]
nl: kerkgebouwen [altLabel]
en: circe [altLabel]
Notes
-----
de: Gebäude für öffentlichen christlichen Gottesdienst, das historisch von Kapellen und Oratorien unterschieden wird, die in mancher Hinsicht private Gebäude oder im weitesten Sinn nicht öffentlich sind. Die Kirchenarchitektur folgt allgemein weitgehend standardisierten Bauformen, die abhängig von Zeit, Ort und Eigenschaften der Kirchengemeinde variieren können. [scopeNote]
en: Buildings for public Christian worship that are distinguished historically from chapels and oratories, which are buildings that are in some respect private, or not public in the widest sense. Church architecture generally somewhat follows standard models, which vary depending upon the date, location, and characteristics of the congregation. [scopeNote]