Closed twagoo closed 7 years ago
I'm not exactly sure if its related, but an old version of the CLAVAS vocabulary didn't have unique prefLabels for all concepts (although each one had an unique code). A new version of the ISO 639-3 CLAVAS vocabulary where each code has an unique preflabel has been created and will be imported into the OpenSKOS 2 test server.
Thanks @menzowindhouwer for that info, I will consider this in my investigation!
I have just updated the CLAVAS on http://145.100.58.79/clavas/public/api/ with the corrected import file for languages.
Thanks. The issue still persists. That is, locally and on alpha the issue does not occur but on the dev-sp host it does. All are configured to use the testing instance of CLAVAS. Next step: investigate whether the issue occurs in the communication between the back end and CLAVAS or in the communication between the front end and back end.
The issue seems to be in the communication between the back end and CLAVAS. The different hosts show different responses:
localhost/alpha (request):
[{"prefLabel@en":"Abkhazian","uri":"http://cdb.iso.org/lg/CDB-00138467-001"},{"prefLabel@en":"Adyghe","uri":"http://cdb.iso.org/lg/CDB-00133873-001"},{"prefLabel@en":"Saint Lucian Creole French","uri":"http://cdb.iso.org/lg/CDB-00133907-001"},{"prefLabel@en":"Adamorobe Sign Language","uri":"http://cdb.iso.org/lg/CDB-00133878-001"},{"prefLabel@en":"Argentine Sign Language","uri":"http://cdb.iso.org/lg/CDB-00133965-001"},{"prefLabel@en":"Algerian Saharan Arabic","uri":"http://cdb.iso.org/lg/CDB-00133758-001"},{"prefLabel@en":"Ta'izzi-Adeni Arabic","uri":"http://cdb.iso.org/lg/CDB-00133893-001"},{"prefLabel@en":"Mesopotamian Arabic","uri":"http://cdb.iso.org/lg/CDB-00133905-001"},{"prefLabel@en":"Arvanitika Albanian","uri":"http://cdb.iso.org/lg/CDB-00133781-001"},{"prefLabel@en":"Arbëreshë Albanian","uri":"http://cdb.iso.org/lg/CDB-00133767-001"},
...
dev-sp (request):
[{"prefLabel@en":"Abkhazian","uri":"http://cdb.iso.org/lg/CDB-00138467-001"},{"prefLabel@en":"Adyghe","uri":"http://cdb.iso.org/lg/CDB-00133873-001"},{"prefLabel@en":"Saint Lucian Creole French","uri":"http://cdb.iso.org/lg/CDB-00133907-001"},{"prefLabel@en":"Adamorobe Sign Language","uri":"http://cdb.iso.org/lg/CDB-00133878-001"},{"prefLabel@en":"Argentine Sign Language","uri":"http://cdb.iso.org/lg/CDB-00133965-001"},{"prefLabel@en":"Algerian Saharan Arabic","uri":"http://cdb.iso.org/lg/CDB-00133758-001"},{"prefLabel@en":"Ta'izzi-Adeni Arabic","uri":"http://cdb.iso.org/lg/CDB-00133893-001"},{"prefLabel@en":"Mesopotamian Arabic","uri":"http://cdb.iso.org/lg/CDB-00133905-001"},{"prefLabel@en":"Arvanitika Albanian","uri":"http://cdb.iso.org/lg/CDB-00133781-001"},{"prefLabel@en":"Arb?resh? Albanian","uri":"http://cdb.iso.org/lg/CDB-00133767-001"},
...
compare the last shown entries: "Arbëreshë Albanian"
(sic, display issue) vs "Arb?resh? Albanian"
.
https://github.com/clarin-eric/component-registry-rest/commit/fb3be688175f9f887006e247dec4f3cf200fc312 adds better content-type headers to the responses of the vocabulary service. This might solve the problem, but we will have to check on dev-sp as so far I cannot reproduce it anywhere else :(
I have just tried the request on that Albanian dialect with (of Italian Albanians),
The response (FF) contains yet another, "escape", representation of ё: prefLabel@en":"Arb\u00ebresh\u00eb Albanian". Do I need to decode \u00-sequences for json responses on our (CLAVAS, CCR) backend or add a decoding option? Note, that format=rdf (default) looks good.
@olhsha
Do I need to decode \u00-sequences for json responses on our (CLAVAS, CCR) backend or add a decoding option? Note, that format=rdf (default) looks good.
Providing the actual character using UTF-8 would work but it looks like my JSON parser is capable of decoding these sequences as well because it shows the right characters in the right places (except for the beta at dev-sp.clarin.eu, which substitutes them for question marks)
it shows the right characters in the right places
You can now see this better since the 'proxy service' (example) sends the correct Content-Type
header). This will be deployed to dev-sp next Friday, so we will know whether this helps for this issue then.
Recent changes did not solve it. Some things to try out here: http://stackoverflow.com/a/138950
Confirmed: running a local container based on the beta image has the same issue.
Thomas reports the following in the results of the test plan:
The cause of this is that a few language names end up clashing due to encoding issues somewhere in the pipeline of importing the vocabulary from CLAVAS. Needs to be investigated.