PARINetwork / pari

Django/Wagtail based PARI webapp
http://ruralindiaonline.org
BSD 3-Clause "New" or "Revised" License
36 stars 11 forks source link

The Indian language Localization Story #410

Open siddadel opened 4 years ago

siddadel commented 4 years ago

PART 1: Semantics In my understanding, there are two parts to the problem of multilinguality of a website viz. 1) Content 2) Commands

Content - Dynamic text: What I mean by content, for example, is -- subjects of articles, bodies of articles, bylines, image captions, block quotes and such other pieces of information in an article. These are unique to each new upload. Besides articles, there are Library resources, photo album text and such other content. There are also other pieces of content that one overlooks in such discussions -- subtitles, audio tracks on Talking Albums, audio tracks on videos. All these I shall include in "Content".

You might have noticed that this is the stuff that our editorial team and our journalists work on. This is the real stuff. You might have also guessed that when we want to translate this content from English -- which is the language that they are predominantly in -- we seek Smita Khator's help and her team of terrific translators translate this.

Content, is the "dynamic" part of our content management system. It changes, dynamically, with every upload. Because this content changes that is why our site is not static but keeps bringing something new. You might hear techies refer to this as "dynamic".

When PARI wants to present this dynamic content in another language, we get it translated from Smita's team. From now on, when we say "Translation" we shall mean multilingual presentation of such dynamic content. This is translated content.

For the most part, except perhaps audio tracks, library, photo albums and subtitles, we already have a lot of translated content. We are already a multilingual website when it comes to content. We serve content in upto 11 Indian languages. Translation at PARI is an editorial process and is led by Smita and Namita.

Commands - Static text: There are, however, those words and literals on our website that haven't changed over the years. They are "static". Words like "CATEGORIES", "Search", "Terms and Conditions", "Date:", "Hindi", "Author", "Farming and its crisis", "Photo Albums" and unfortunately -- "P. Sainath" (Joke. He is far from static but on the Founder's page, those letters are static. Think 'guiding star')

When one wants to present these words -- these static literals -- in a language other than English, then this process is called "Localization". But shouldn't it be called translation because that is what one needs to do? Isn't Smita and her team going to do it anyway? Yes, it does entail translation and the same team will translate these words as well.

But to distinguish between translated dynamic content and translated static content, we shall call the latter as "Localization" and the former as "Translation". Silly semantics, I know, but useful in tech and workflow discussions.

await Part 2: Localization

Part 2: Localization The first important step of localization is to make a list of all the English static text on the site and create a spreadsheet of this. The next step is to get translations of all this static text. Both these steps have already been carried out. Olivia Waring helped us with the first step where as Smita, Medha, Qamar among others translated the static text. This work is, for the most part, done.

Now the main step of this process that is pending, is to get the website to load Indian language static text for the right Indian language reader. That is, when a primarily Hindi reader wants to access PARI -- he/she should get Hindi static text: The Hindi Localization.

How will this work?

There are 3 main interactions for this: 1) URL - the site link: hindi.ruralindiaonline.org will be the Hindi localization marathi.ruralindiaonline.org will be the Marathi localization and so on.

When a visitor comes to hindi.ruralindiaonline.org, the static text will all be in Hindi. marathi.ruralindiaonline.org, the static text will all be in Marathi. english.ruralindiaonline.org is the same as ruralindiaonline.org

2) Homepage: hindi.ruralindiaonline.org will have its own homepage. Remember homepage is not static text. It has a lot of dyanmic parts to it. The carousel changes, the slideshows change. Thus there will have to be Indian language versions of our homepage and some editorial decision making will be involved in changing it on a weekly/daily or monthly basis.

3) Search results, list of articles in a category, list of articles of an author etc. The default language of search results or articles shown for a category will be the Hindi for hindi.ruralindiaonline.org.

4) There will be a drop down in the footer where one can change the localization. Marathi selection will take us to marathi.ruralindiaonline.org, English will take us to ruralindiaonline.org

So now we need to write the Python code to implement these 4 things.

siddadel commented 4 years ago
  1. As a Hindi reader I want to see Hindi static text when I enter "hindi.ruralindiaonline.org" so that I can read the entire website's static text in Hindi. Thus the Hindi .po file should be loaded.

  2. As a Hindi reader when I load the homepage, it should load not load the English page here: https://ruralindiaonline.org/admin/pages/3/ https://ruralindiaonline.org/admin/pages/3/edit/

Instead there should be Hindi, Marathi and 11 other versions of the same page. Each should be loaded when their subdomain is requested.

If the dev finds it correct and logical: the same homepage can also load when entered https://ruralindiaonline.org/?lang=hi

  1. There are articles lists and search results that show up all across the site for example: https://ruralindiaonline.org/gallery/categories/Little%20takes/?lang=hi https://ruralindiaonline.org/authors/namita-waikar/?lang=hi https://ruralindiaonline.org/locations/1524-jatavada/?lang=hi https://ruralindiaonline.org/archive/2019/9/?lang=hi

May be -- and I speak from ignorance -- but lang can be a session variable.

These three things should take care of all localization.

  1. Add a drop down in the footer to change the localization from Hindi to Marathi etc.

In the first iteration, a little English on screen will be okay. Don't be shy of deploying just because there is some English .