carpentries / lesson-infrastructure

4 stars 3 forks source link

Translations workflow for the Carpentries lessons #24

Open dpshelio opened 5 years ago

dpshelio commented 5 years ago

Till now different attempts have been made to translate the lessons. The most "successful" has been branching from the English lesson and translating directly over the original files. When the original gets updated it can be pulled over the translated one showing the changes either as new insertions or conflicts with the translated text. Though this process works it's not efficient for translators and it doesn't make it easy to measure how much it's been translated. Below it's presented a new workflow (previously discussed on the mailing list).

How internationalisation (i18n) is made in other projects

The common workflow for translations is done by generating tokens for each word/sentence/paragraph (depending of context) into messages. Such file with the messages of the whole project is named .pot (portable object template). That template file is used as a base for the translations for each language (in files like .es.po, .it.po, .fr.po, ...). Finally, these files are built into the original format (in this case md).

Translation workflow Fig 1. Common workflow of the translation of a lesson.

There are plenty of tools that work with this file format (e.g., gettext, translate toolkit, ...) that provide statistics of what's been translated or what not, whether they've been confirmed or only proposed, glossary of words, ...

When the original lesson gets updated, the pot file gets updated and so the .nn.po marking with messages are new, which ones have been modified and which ones need no attention.

How to implement this on the carpentries lessons

The main material of the carpentries are markdown (md) files. There are also some graphical information, but much of it can be left as it is for now (I have also some ideas about the translations of svg files, bitmaps would need to be remade.). Additionally, all the lessons use the same template and they are built into html using jekyll by github pages.

Separate content from template

To simplify the management between different carpentries lessons (originals and translations) I've adopted the capability that Jekyll offers with themes. This is shown in detail on carpentries/styles#229.

Translations hub repository

This is a repository that links (via git submodules) to all the lessons and to a particular git revision for each. By keeping a particular revision makes it easy for the translators to focus on a material that's not changing too frequently. This revision could be the latest available release.

This repository has all the machinery to generate the template files, generate stats and produce the lessons in the new language. Most of these actions are automated and done via Travis-CI.

Translations hub Fig 2. How the translation hub connects it all.

Figure 2 shows the workflow process. po magic is this translations hub repository that contains links to each lesson as submodules and a directory with all the po files. A translator would fork this repository (no need to worry about the submodules), create or modify a po file translating messages into the desired language and make a pull-request (PR) with that new file. Consequently, an editor for that language would review the translations. If merged, Travis-ci starts to convert these messages into md files and generate a read-only git repository on github with such output. Additionally, a submodule for that translation will be created (or updated) on the original lesson. The purpose of this submodule is to be able to show all the available languages from the same page. Travis will also generate stats for each lesson and each language and push them to the gh-pages repository. This will show to possible translators where we need them.

How does this affect to...

users (learners and instructors)?

They should not notice any change on the lessons except the availability to change the language of the lessons (as shown on Fig 3).

lessons with translation icon Fig 3. Sample of how the lessons look with the translations included.

lesson maintainers?

Each lesson will have a new directory _locale with as many submodules as translations are available. They don't need to touch such directory and it won't be pulled unless specified. When receiving a PR to update that a particular submodule, they will know it has been already approved and merged by an editor of that particular language and they need to approve only the update of where the submodule is pointing to.

Also, with the themes in place they won't need to worry about change of styles across the lessons.

translators?

A great improvement it will be introduced into their workflows. Now they can use more appropriate tools for translations (though they could still use their favourite text editor if they would like so) like poedit, Gtranslator, OmegaT, Lokalize, virtaal or online collaborative platforms that accepts this type of files like weblate, zanata or pootle.

Most of these tools allow the creation of a glossary (very useful to keep consistency across the lesson and lessons) and some also provide suggestions based on other translations or online translation engines.

The online platforms are normally more attractive, however may be a bit painful to use with a flaky internet connection. In any case, at least zanata could be integrated in our workflow as it offers an API from which we could pull/push translations produced in other ways. Though this would need more experimentation.

a new translations hub maintainer?

This whole new process would need of someone in charge of keeping the hub working, adding new lessons to be translated, update their revisions when a new release happens, inform and train translators on what they need to translate and what not, feedback to lessons maintainers on how to make the lesson more translation friendly (mostly regarding line formatting), etc.

I'm happy to volunteer for this role.

dpshelio commented 5 years ago

Possible problems:

tracykteal commented 5 years ago

Thanks so much for this detailed information including the diagrams. This captures the conversation well and provides a good potential path forward. Thanks for putting together all of these ideas. Agree with needing something like a hub maintainer and @dpshelio thank you so much for volunteering for this type of role!

Focusing on translating releases makes sense, and both this and the themes fit well into an overall updated release and development workflow. The trials that were done already are helping to see how this is possible, and overall this might have an element of needing to try it to see how it will all work.

Even with automatic translation, as you've outlined here, and is always the case, there will need to be some manual work. My main concern is maintaining high quality translations, so having a community or a committed person around the lessons in each language where it's translated seems like it will be important. So, we can look at both technical and community aspects to maintaining translations.