eric-muller / udhr

Universal Declaration of Human Rights
6 stars 4 forks source link

Complete transcription for Northern Pashto #7

Closed behnam closed 6 years ago

behnam commented 6 years ago

The quality of the source PDF is not that good, therefore we need someone familiar with the language to be able to read the text and type it in Unicode text format.

Current Status

https://unicode.org/udhr/s/status_pbu.html Stage 3: Unicode version for first article

HTML: https://unicode.org/udhr/d/udhr_pbu.html

Text Source

http://searchlibrary.ohchr.org/record/18146?ln=en http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=pbu

PDF: http://www.ohchr.org/EN/UDHR/Documents/UDHR_Translations/pbu.pdf

saadatm commented 6 years ago

I asked on an Urdu forum if anybody who knows Pashto could type the Pashto translation in Unicode, and a couple of volunteers agreed to type and proofread it. There are a few problems, though:

  1. The source PDF at OHCHR is missing Article 28.
  2. Since the source PDF is in a bad shape, the volunteers used another translation (from the Afghanistan Legal Documents Exchange Center (ALDEC)) for guidance. They have tried to remain faithful to the OHCHR translation, but the Unicode text still deviates from it at some places (and Article 28's translation is copied from the ALDEC translation).

So, what should be the way forward?

behnam commented 6 years ago

Thanks for the work, @saadatm!

First, the ALDEC document looks a good source by itself. Since it's bilingual in Dari and Pashto, I think it's a good idea to consider it an additional source for these languages. (I'll file a separate issue for myself to compare ALDEC Dari with the current version that I edited a few years ago against OHCHR copy.) (Filed: https://github.com/unicode-org/udhr/issues/10)

About OHCHR Pashto text, I would say let's add whatever we have right now to the repository, including the construction of Article 28 based on ALDEC. I think the status of the document can note that it needs review based on a better copy of the source.

That's my opinion. Let's see what @eric-muller recommends.

eric-muller commented 6 years ago

Since it seems unlikely we could get a good transcription of the OHCHR text, let's just go with the ALDEC text.