Closed koenedaele closed 10 years ago
The Getty vocabulary returns values encoded in UTF-8.
RDF files:
<?xml version="1.0" encoding="UTF-8"?>
SparQL requests:
res = requests.get(self.base_url + "sparql.json", params={"query": query})
res.encoding = 'utf-8'
Using the function RDFLiteral.toPython()
will return the values in unicode UTF-8.
unicode
typestring
type. All string
values in Python 3 are unicode
(string
can be converted to the byte
type)The decode('utf-8')
in the examples/churches.py
will raise errors:
unicode
to string
, an encode('utf-8')
function is necessary.unicode
is allready string
. The decode('utf-8')
will raise an error, because the function expects an input byte
type. Using encode('utf-8')
will also raise an error because the returned value will be a bytes
type which can not be combined in a string item.If the decode('utf-8')
in the examples/churches.py
is removed, there will be no errors. The label and note types are unicode
types and the output will be printed correctly:
Labels
------
en: churches (buildings) [prefLabel]
es: iglesias [prefLabel]
de: kirchen (Gebäude) [prefLabel]
nl: kerken [prefLabel]
en: church (building) [altLabel]
es: iglesia [altLabel]
en: cirice [altLabel]
en: kirks (buildings) [altLabel]
en: kirrke [altLabel]
en: church buildings [altLabel]
en: churche (building) [altLabel]
de: kirche (Gebäude) [altLabel]
de: kirchenbau [altLabel]
de: Kirchengebäude [altLabel]
en: kirkes [altLabel]
de: kirchenbauten [altLabel]
en: kurks (buildings) [altLabel]
nl: kerk [altLabel]
en: chirche [altLabel]
en: chiriche [altLabel]
en: church architecture [altLabel]
en: chureche [altLabel]
nl: kerkgebouwen [altLabel]
en: circe [altLabel]
Notes
-----
de: Gebäude für öffentlichen christlichen Gottesdienst, das historisch von Kapellen und Oratorien unterschieden wird, die in mancher Hinsicht private Gebäude oder im weitesten Sinn nicht öffentlich sind. Die Kirchenarchitektur folgt allgemein weitgehend standardisierten Bauformen, die abhängig von Zeit, Ort und Eigenschaften der Kirchengemeinde variieren können. [scopeNote]
en: Buildings for public Christian worship that are distinguished historically from chapels and oratories, which are buildings that are in some respect private, or not public in the widest sense. Church architecture generally somewhat follows standard models, which vary depending upon the date, location, and characteristics of the congregation. [scopeNote]
The skosprovider should make sure that all output is already unicode. That probably means adding stuff like .decode('utf-8') where appropriate. We might need to be careful with possible asian characters. Not sure how all of this will work. See
examples/churches.py
for an example containing non-ascii characters.