eric-muller / udhr

Universal Declaration of Human Rights
6 stars 4 forks source link

Fixed various transcription errors for Croation xml files #31

Closed kontur closed 3 years ago

kontur commented 4 years ago

Rosetta Type launched this web app to preview the UDHR in various languages and fonts. We got user feedback that the Croatian text (based on this repository) are containing errors, particular wrong accents (mostly zcaron U+017E / ccaron U+010D and their uppercase variants) and the transcribed dbar (U+0111) letters.

Looking at the OHCHR page the Croatian translation appears to be a pixel PDF and the text extracted from that file with optical text recognition yields approximately the same false transcriptions. Our assumption was that this automatically extracted text has never been scrutinized for accuracy and those are automated text recognition errors.

We and the native language speaker reporting the error have gone through the text and corrected the transcript included in this PR.

CLAassistant commented 4 years ago

CLA assistant check
All committers have signed the CLA.

kontur commented 4 years ago

Anybody home? :)

eric-muller commented 3 years ago

Sorry for the delay in merging the changes, and thanks for your help. I have credited "Rosetta Type"; please let me know if you want additional credits.