Specifying the translation/localization workflow: How to use Transifex best?

tokideveloper commented 6 years ago

This issue is to find out how to use Transifex best for our needs.

Key questions (checked if solved) include:

[ ] When to upload new/updated source files? For which languages?
[ ] Requests for which languages do we want accept? Which ones reject?
[ ] Rename files before uploading? Which files to which names? How to automate?
[ ] How to build the best environment for the translators? How to reduce mistakes? How to communicate with translators? A manual for translators?

Related issues:

tokideveloper commented 6 years ago

I recently (2018-01-24) uploaded the file /pages/home.html (289 lines) from the qubesos.github.io repo to Transifex to figure out how to deal best with HTML files.

So, I renamed the file before uploading and this is what I got:

Uploaded as "home.html" (61 strings)

Good
- General dealing with HTML tags via the "capsule halves" is okay.
Bad
- The YAML header is missing.
- Cannot adapt the value of any "href" or "src" attribut to match language-dependent paths (see string 2, capsule half "1"). (*)
- Line 230, "{% endfor %}", between Transifex strings 52 and 53 is missing. This seems bad to me, even it is not to be translated in this example.
Interesting
- Many HTML tags (mostly "div"s) are missing.

Uploaded as "home.md" (12 strings)

Good
- All HTML tags and all texts are shown in the source strings. This is very nice!
- Within the YAML front matter, many of the relevant attributes "layout", "title" and "permalink" are translatable. That's nice!
Bad
- Within the YAML front matter, the redirect_from list is missing.
- The values of relevant attributes ("href", "src") cannot be changed. (*)
Undecided
- All HTML tags within a paragraph in the source file appear as a single phrase to translate. This is good since the translators do not need to copy them one-by-one. But it is bad, too, because it could become a mess.

Uploaded as "home.txt" (142 strings)

Like "home.md" but

It is possible to change the YAML "redirect_from" attribute, too.
The last line (288 in the source file, string 142 in the Transifex interface), containing the closing "div" tag, is missing. Is it a bug?

Summarizing the result

It is nice to have these "capsule halves" and to deal with them.
Renaming an HTML file to *.md before uploading seems to be the best way to handle it.
Concerning all three ways of upload, a translator cannot change the values of the attributes "href", "src" etc. unless they enable the raw editor mode via the gear icon menu top right. But after doing so, translation becomes a mess if many HTML tags are involved. This reduces overview and raises chances for big mistakes. Here, we need a solution.

tokideveloper commented 6 years ago

A solution

Idea: Prefix all link paths with a language-independent Liquid expression which will resolve to a language-dependent path prefix leading to the correct (translated) resource.

See my updated language switch prototype. There, I have added a local variable langprefix to the YAML front matters of both all canonical and translated files. While that variable remains empty in the canonical files, it is set to /de-DE, /pt-BR etc.. When a localized link path is needed then the path has to be preceded by the Liquid expression {{ page.langprefix }}, outputting the content of the variable.

Advantages: This way, all the spots where that expression exists do not need to be touched by the translators. Instead, they only need to adjust the language-dependent variable assignments in the YAML front matter which could be automated. This also simplifies translations in general and gives us flexibility if a prefix modification is needed in the future.

Any comments?

adrelanos commented 6 years ago

How do you verify that translation in language X really says what the original says and does not instruct the user to do things that are not in the original instructions?

I.e. how do you check that no malicious translation go live?

tokideveloper commented 6 years ago

How do you verify that translation in language X really says what the original says and does not instruct the user to do things that are not in the original instructions?

Concerning this problem, Andrew already created a doc layout stating this:

Caution: This page is an unverified translation. The Qubes OS Project cannot evaluate the accuracy of translations into languages that our team cannot read.

However, inspections by trusted, Qubes-knowing and language-X-knowing people who would sign translations afterwards could help. But this could be sophisticated to establish. Maybe in the future?

I.e. how do you check that no malicious translation go live?

That is really a good question because malicious scripts or something could be injected (and the Qubes OS project should not become famous for a danger website). Since this cannot be solved by an algorithm in general, smart and especially trusted people are needed for this task.

I propose that there could be someone or a team who verifies translated files for harmlessness (aside from the correctness of the content) before they go live. This includes

checking all URLs and paths and
checking (java)scripts.

Have I forgotten important items?

I guess that these checks could be done without knowing language X.

Maybe tools could help to inspect diffs in these files, diffs in the HTML trees, diffs in the overall structure of HTML and MD files etc.?

tokideveloper commented 6 years ago

An approach concerning the number of target languages:

Currently, we have over 20 project languages on Transifex. I'm not a psychologist but I think that expectations grow by raising number of languages ("There must be so many people interested in/liking Qubes that it's worth to offer it in over 20 languages! Thus, it must be well and almost completely translated!"). So, in my view, it's better to have less languages but better/more translated.

unman commented 6 years ago

On Mon, Mar 19, 2018 at 12:12:46PM +0000, Tobias Killer wrote:

An approach concerning the number of target languages:

Currently, we have over 20 project languages on Transifex. I'm not a psychologist but I think that expectations grow by raising number of languages ("There must be so many people interested in/liking Qubes that it's worth to offer it in over 20 languages! Thus, it must be well and almost completely translated!"). So, in my view, it's better to have less languages but better/more translated.

Is there any way of tracking which translations are being used?

tokideveloper commented 6 years ago

On Mon, Mar 19, 2018 at 12:12:46PM +0000, Tobias Killer wrote: An approach concerning the number of target languages:

Currently, we have over 20 project languages on Transifex. I'm not a psychologist but I think that expectations grow by raising number of languages ("There must be so many people interested in/liking Qubes that it's worth to offer it in over 20 languages! Thus, it must be well and almost completely translated!"). So, in my view, it's better to have less languages but better/more translated.

Is there any way of tracking which translations are being used?

An old plan is to list all files uploaded to Transifex along with their commit IDs. This way, we'll know which file a particular translation is of.

Another plan is to upload all translatable files to Transifex automatically on creation/modification/deletion and to download all files from Transifex on modification. (Transifex knows which parts of which files have already been translated/reviewed into which languages and marks them respectively.) This way, aside from a time delta t greater than zero, the currently translated files are translations of the current files in the canonical version. No need of a tracking list.

But it's not clear yet which plan to follow. Tests will show.

Could I answer your question?

tokideveloper commented 6 years ago

How to deal with YAML files?

Recently, I tried to upload /_data/architecture.yml to Transifex but it did not work (I got an empty error message).

Following the Transifex documentation about YAML, I inserted a new root element en: as shown in the example and it still didn't work.

After playing around for a while I came to the conclusion that Transifex seems to be a little bit wayward about which structures are allowed and which depth is the maximum in YAML files. Thus, renaming a YAML file by appending .txt before uploading is the best way I could find to handle these files. Leading spaces will be saved by Transifex internally for the later download while in the translation interface the user won't see them. This works well and fits our needs.

tokideveloper commented 6 years ago

Checking the overall structure of a file after downloading it from Transifex.

I think it's worth to run checks on translated files before processing them any further. This includes:

Empty lines in the canonical version must remain empty in the translated version.
Non-empty lines in the canonical version must be non-empty in the translated version.
HTML tags must remain tag by tag and line by line. No additional tags allowed.
Leading spaces must remain.
Association keys in YAML files must remain.
Headings must remain as headings with the same depth.
Link references in MD files must match the related links as they do in the canonical version.
External links must remain untouched.
Internal links must remain untouched (see this solution).
No additional links/pics etc. allowed.
Inline code and code blocks must remain untouched.

These measures will support both quality and integrity of the results.

EDIT: These checks could be part of the Translation Check in Transifex. As described in the documentation about Setting Translation Checks, checking depends on the file format:

Each file format has a set of translation checks on by default. You can change whether a failed check results in an error or warning, or turn off a check all together.

Since it's likely that we'll upload files renamed to *.txt or such, the Translation Checks may work incorrectly. Thus, checking the translations on our own may still be necessary. END OF EDIT.

EDIT2: Some items added or corrected.

marmarek commented 6 years ago

Another plan is to upload all translatable files to Transifex automatically on creation/modification/deletion and to download all files from Transifex on modification.

IMO at least the first part is preferable. Otherwise it will easily desynchronize as it is today (old content is translated, possibly wasting someone's time). As for downloading, I'm not sure. Can be similarly automated, or called from time to time (once a week?)

tokideveloper commented 6 years ago

Another plan is to upload all translatable files to Transifex automatically on creation/modification/deletion and to download all files from Transifex on modification.

IMO at least the first part is preferable. Otherwise it will easily desynchronize as it is today (old content is translated, possibly wasting someone's time).

I agree. I think that at least one update a day, if available, should be fine.

As for downloading, I'm not sure. Can be similarly automated, or called from time to time (once a week?)

Assuming that we'll use the Transifex API, we would have a request rate of at most 6000 requests/hour. But I'm not sure what can be done by a single API request. We have currently about 300 documents to translate. Let's say we would have 20 target languages then we'll probably need 6000 calls for downloading them all. This gives at most 1 update per hour and should be fine.

I think that immediate feedback (i.e. a recently translated string goes live) is important for keeping up the translators' motivation. In my case, a delay of one day would okay, but not any longer since I cannot keep the mental connection between my typed work and its live result for so long. Shorter would be better, thus, we should exhaust the API request limit (or maybe half of it to have some in reserve).

tokideveloper commented 6 years ago

Concerning the renaming of files after/before uploading them to Transifex:

Currently, the names of the translatable resources on Transifex consist only of the source file name (with an intentionally .txt-manipulated ending). This makes it hard to see the context of the resource resp. the file path.

So, it seems to be useful to rename the resources on upload by adding some context. Generally, I see two main possibilities:

URL paths (e.g. /intro/):
- advantages:
- Stable against source file moves.
- Easy for translators to type them into the URL bar of their web browsers to navigate to the page they translate. This helps getting the context of the translated strings, e.g. screenshots etc..
- disadvantages:
- Not applicable to files without a URL (e.g. layouts, templates).
source file paths (e.g. /_doc/basics_user/intro.md or /_layouts/doc.html):
- advantages:
- Stable against URL path moves.
- Applicable to all source files.
- disadvantages:
- Difficult for translators to use it as navigation to the source file in a specific (sub-)repo.

Each resource on Transifex has a project-unique slug (i.e. an ID, not visible inside the web translation editor) and an arbitrary name (visible inside the web translation editor). Both can be modified later on.

As I could find out by trial and error: Transifex slugifies the source file name (without its path - at least when manually uploading) to generate the unique slug. If a default slug is already in use then a _NUMBER will be appended to it or incremented.

The name can consist of at least 300 characters but not of newline characters. Currently, none of our source file has a path longer than 100 characters. Thus, 300 characters are enough to hold both the URL path and the source file path.

As a first approach, I suggest renaming of name to something like this: URL_PATH + :: + SOURCE_FILE_PATH, e.g.

/intro/ :: /_doc/basics_user/intro.md and
:: /_layouts/doc.html.

(The string ::, having surrounding space characters, works as a visual separator.)

Although the name may have 300 characters or more, only about 50 to 100 characters fit into the Transifex editor inside my small web browser window, depending on my zoom level. However, selecting the whole name via the mouse cursor is possible but may be difficult and unintuitive for most translators.

Thus, as a second approach, it might be better to put only one of URL_PATH and SOURCE_FILE_PATH into the name. Like this:

URL /intro/ (i.e. URL + URL_PATH, for files which have a permalink) and
FILE /_layouts/doc.html (i.e. FILE + SOURCE_FILE_PATH, for files without a permalink).

The reason for preferring the URL is: I think that it's more useful for a translator to have a link path (for entering it into a web browser, as described in the advantages above) than a source file path.

The file type of a new resource is extracted from the ending of the name of the source file as usual. The type doesn't seem to be modifiable after creating the new resource.

The name doesn't need to contain the file type. The translators can see the file type inside the Transifex editor independently from name. However, since we'll probably change the type (by changing the ending, as described e.g. in a post above) it might be wise to add the original ending.

Last but not least: Maybe we should add a WEBSITE prefix to the name since the files describe website files, not program files. But this can be done in the future, too.

Any comments, opinions, suggestions?

marmarek commented 6 years ago

I don't remember - what is decision (if any) about translating URLs? Should it be /pl-PL/intro, or /pl-PL/wprowadzenie? If URLs should stay original, then IMO second approach is better, otherwise always use file path.

tokideveloper commented 6 years ago

I don't remember - what is decision (if any) about translating URLs? Should it be /pl-PL/intro, or /pl-PL/wprowadzenie? If URLs should stay original, then IMO second approach is better, otherwise always use file path.

It should be /pl-PL/intro. The less we touch, the less we crush.

In other words: Yes, URLs shall stay original. See this post.

I also vote for the second approach.

tokideveloper commented 6 years ago

For automating Transifex-related processes, we need software. But who do we trust more or at all?

Transifex' Python-based CLI client or
tools like cURL for using Transifex' URL-query-based API?

I didn't check the implementation of the CLI client but I would bet it's just a nice wrapper for using the API.

andrewdavidwong commented 3 years ago

@marmarek, @tokideveloper, I just noticed that the Transifex link is dead:

https://www.transifex.com/otf/qubes/

Any idea what happened?

tokideveloper commented 3 years ago

@marmarek, @tokideveloper, I just noticed that the Transifex link is dead:

https://www.transifex.com/otf/qubes/

Any idea what happened?

This is strange. It leads to a blank page. However, when I visit

https://www.transifex.com/otf/

then I have to login to Transifex. After logging in, a redirection to

https://www.transifex.com/otf/qubes/dashboard/

occurs (Maybe our Transifex project is the only one in /otf/?). So, maybe we should use one of these two links instead?

I also noticed that on the dashboard page itself, the link to the dashboard is https://www.transifex.com/otf/ rather than https://www.transifex.com/otf/qubes/dashboard/ .

andrewdavidwong commented 3 years ago

https://www.transifex.com/otf/qubes/dashboard/ also 404s when not logged in, so it also won't be suitable as a public link on our website.

https://www.transifex.com/otf/ doesn't 404, but it's simply a bare login page with nothing Qubes-specific.

I guess I'll just update the link on our website to the main Transifex website for now.

QubesOS / qubes-issues