Some character encoding went wrong when migrating from Fuseki to BlazeGraph

Claire Madl (email):

(...) I found out that diacritics are not properly decoded by the interface. As you can see below, the left panel displays the title properly ("Tagebücher" for instance) but the same title on the central panel and then the pre-annotations on the right panel are full of mistakes. (...)

Me, after looking into it (email):

(...) as you pointed out, diacritics appear correctly in some parts of the interface but incorrectly in others. The parts where it appears correctly are sourced from our Elasticsearch index, which we use to store the text and to perform full-text search. This is why the search results list and the body of the source look correct. The parts where the diacritics are wrong are sourced from the triplestore, in which we keep source metadata and annotation data. This is why the title of the source panel and the snippets in the annotation blocks appear incorrect. I suspect we have made a mistake when moving the linked data from Fuseki, the old triplestore, to BlazeGraph, the new triplestore, causing those data to be incorrectly encoded.

If my analysis is right, this affects string literals that were migrated from Fuseki to BlazeGraph, but not string literals that were added to BlazeGraph after the migration.

Let's discuss this, make sure we fully understand the issue, and then figure out a way to address it. It's ugly, but (I think) not urgent enough to overwork ourselves.

UUDigitalHumanitieslab / readit-interface

Some character encoding went wrong when migrating from Fuseki to BlazeGraph #538