(...) I found out that diacritics are not properly decoded by the interface. As you can see below, the left panel displays the title properly ("Tagebücher" for instance) but the same title on the central panel and then the pre-annotations on the right panel are full of mistakes. (...)
Me, after looking into it (email):
(...) as you pointed out, diacritics appear correctly in some parts of the interface but incorrectly in others. The parts where it appears correctly are sourced from our Elasticsearch index, which we use to store the text and to perform full-text search. This is why the search results list and the body of the source look correct. The parts where the diacritics are wrong are sourced from the triplestore, in which we keep source metadata and annotation data. This is why the title of the source panel and the snippets in the annotation blocks appear incorrect. I suspect we have made a mistake when moving the linked data from Fuseki, the old triplestore, to BlazeGraph, the new triplestore, causing those data to be incorrectly encoded.
If my analysis is right, this affects string literals that were migrated from Fuseki to BlazeGraph, but not string literals that were added to BlazeGraph after the migration.
Let's discuss this, make sure we fully understand the issue, and then figure out a way to address it. It's ugly, but (I think) not urgent enough to overwork ourselves.
Resolved in 421c159a8c174e40eb362b2f04ccc0cc4a269330
Existing triples fixed with scripts in cd4bb2caf4dc662efb0f09b76fea5c5bd860b204 and 8db101bff10b5de94f7f06fd28bb2a9c6fd94025
Claire Madl (email):
Me, after looking into it (email):
If my analysis is right, this affects string literals that were migrated from Fuseki to BlazeGraph, but not string literals that were added to BlazeGraph after the migration.
Let's discuss this, make sure we fully understand the issue, and then figure out a way to address it. It's ugly, but (I think) not urgent enough to overwork ourselves.