Fixed various transcription errors for Croation xml files

kontur commented 4 years ago

Rosetta Type launched this web app to preview the UDHR in various languages and fonts. We got user feedback that the Croatian text (based on this repository) are containing errors, particular wrong accents (mostly zcaron U+017E / ccaron U+010D and their uppercase variants) and the transcribed dbar (U+0111) letters.

Looking at the OHCHR page the Croatian translation appears to be a pixel PDF and the text extracted from that file with optical text recognition yields approximately the same false transcriptions. Our assumption was that this automatically extracted text has never been scrutinized for accuracy and those are automated text recognition errors.

We and the native language speaker reporting the error have gone through the text and corrected the transcript included in this PR.

CLAassistant commented 4 years ago

All committers have signed the CLA.

kontur commented 4 years ago

Anybody home? :)

eric-muller commented 3 years ago

Sorry for the delay in merging the changes, and thanks for your help. I have credited "Rosetta Type"; please let me know if you want additional credits.

eric-muller / udhr

Fixed various transcription errors for Croation xml files #31