godotengine / godot-website

The code for the official Godot Engine website. A static site built using Jekyll.
https://godotengine.org
MIT License
301 stars 148 forks source link

Internationalize the official web site #47

Closed Geequlim closed 1 year ago

Geequlim commented 6 years ago

Could you please support translations for the official web site like the godot editor ? If the web site speaks native languages that would make godot more attractive to new people.

Calinou commented 4 years ago

What's the status of i18n support in October? Is it built-in or do we need to install a plugin?

Also, is there a way to store translations in static files (rather than in a database)? This is required to accept contributions without giving website credentials to translators.

YuriSizov commented 4 years ago

So it seems, that everyone is using Rainlab's Translate plugin (https://octobercms.com/plugin/rainlab-translate). There are even some ancillary plugins to along with that. The main plugin itself provides support for translating static pages and dynamic pages' text fields. It also comes with a language selector component that can be just plugged in the UI (but probably should be styled additionally).

For dynamic pages forms receive an additional selector to switch between editing various languages: image

As far as I can see, since it's dynamic content it cannot be exported in any way for contributors. Though, I don't think that there is a huge need to translate news at the moment, is there? (Even slugs can be translated by the way!)

For static pages the process of translating is pretty much straightforward (for a CMS admin at least). It can even be used to control the content of static pages in default English without edits to the layout itself. Basically, any line of text can be turned into a translatable string by surrounding it with some syntax:

{{ "We're sorry, but the page you requested cannot be found."|_ }}

Lines are converted into IDs internally. So this one is going to be something like were.sorry.but.the.page.you.requested.cannot.be.found. Alternatively, strings acting as IDs can be used directly in the markup.

<a href="/">{{ 'err404.goto.home'|_ }}</a>

For an admin these lines are then made accessible and editable in all supported languages via a screen in Settings section.

image

There are ways to use files instead of a database, though. First of all, there are Export and Import buttons which allow to extract strings in the CSV format. But that's barely usable for our goals, probably. So there is another way, using YAML and dedicated files in the theme files.

Each language must be defined in the theme.yaml like this:

translate: 
    ru: i18n/ru.yaml

And the corresponding files must exist with all the lines provided by their IDs:

were.sorry.but.the.page.you.requested.cannot.be.found: 'Увы, мы потерялись!'
err404.goto.home: 'Вернуться на главную'

The problem is, I don't think there is a way to use the translated strings as keys in YAML, and so we must use IDs, which are not exposed in any way unless you export the lines. So we'd probably have to use those IDs in the markup instead of English messages and move all English text to the i18n side of things.

This plugin will also affect the SEO, as the default way to switch languages is to append a language section to the URL path (like /ru/news). The language is then stored in the browser's storage as far as I can see.

image


Edit: I guess the titles of the static pages have to be translated via the GUI still. And as I'd attempted to try that out the translated pages just broke. And remained broken even after I've removed the translated title. Huh.

Edit 2: That was because the slugs are generated for the translated titles automatically. So like /community turned into /ru/soobshchestvo. And they only work for their corresponding language. Can be fixed by manually settings the slug for each additional language to an empty string.

Edit 3: Noticed, that the translated titles for static pages are stored in plain text in the htm files themselves, like this:

[viewBag]
localeTitle[ru] = "Сообщество"
==

So it may actually be "contributeable".

YuriSizov commented 4 years ago

I added another edit, it seems that titles are stored in files as well, just not dedicated language files. Though, come to think about it, it should probably be possible to move those to the language files as well and use some code injections to set the page title with a translated string.

So internationalization is doable and can be augmented with some generous help from our contributors. The only question is, do we want it? Or, I guess, it's the question that the dedicated web presence hire would answer?

Calinou commented 4 years ago

@pycbouh I think it makes sense to internationalize the main website's static content, but I wouldn't bother about translating news posts due to their relatively high update frequency.

YuriSizov commented 4 years ago

So, I've been working on a PoC PR for this feature and I have to decide on the acceptable way we are going to make translatable strings in text. There are two possibilities, as I've described above. We either move to codes that will represent translatable messages, or we leave the English strings as is and use the built-in identifier generation for the strings in other languages.

Here are a few examples to better understand the trade-offs.

  1. A regular message. Take for example this piece that is used in the title of every page:
    <title>Godot Engine - Free and open source 2D and 3D game engine</title>

The first part of the title is obviously untranslatable, however the second part is a perfectly normal example for a message that can be translated. The default approach would be to wrap it with the special syntax and use the admin panel to scan for it. The code would look like this:

<title>Godot Engine - {{ 'Free and open source 2D and 3D game engine'|_ }}</title>

The string is translatable via the admin panel, however to make use of the localization files we need to add a key-value pair to it. The key would be a generated identifier. In this case it is free.and.open.source.2d.and.3d.game.engine. The plugin is open source and the code for this is easy to reproduce, so we can, in theory, generate language files via a simple command line operation. In fact, I've already implemented it and can now generate the same output as the plugin, namely these pairs:

{
  'free.and.open.source.2d.and.3d.game.engine' => 'Free and open source 2D and 3D game engine'
}

The downside here is that the keys are dependent on the English values. So if the original text changes there is nothing linking the localization to it. And the lines duplicate:

image

So to successfully merge strings between changes we would need some way to identify them cross versions. I can suggest something clever like using the source file name and path and the layout path to generate a unique identifier that would instead be tied to the place where the text is located, not the text itself. But it's hardly robust.

  1. A very short message. There are edge cases as well, for example in the top navigation:
    <a href="/features">Features</a>

The default approach will yield a very simple identifier for this string — features.

<a href="/features">{{ 'Features'|_ }}</a>

The mechanism for translation provided by this plugin is global, two strings in two different places that are exactly the same will be grouped and translated the same. There is no specificity. This may not always be ideal and there is no way around it if we keep the English strings in the code as is.

  1. A very long message. Another edge case and a culmination of the previous two examples is a paragraph of text that is reasonable to be translated as a whole.
    <p>
        Godot is completely free and open-source under the very permissive MIT license.
        No strings attached, no royalties, nothing.
        Your game is yours, down to the last line of engine code.
    </p>

It is extremely unreasonable to use the whole thing as an identifier, and to counter that the plugin limits the length of each to only 250 characters. This naturally means collisions for everything above 250 significant characters. And that's a lot of text to depend on for identifier generation. The more text there is, the less stable the identifier is, the more problems with merging strings we might have.

That, and multiline strings may be not as pretty looking when wrapped into {{ ''|_ }}.


So, I would strongly go for hand crafted ID-like strings. Such IDs would have a category prefix, and a descriptive yet short middle part. Though, we will be losing the feature for string substitution with such IDs. Also, formatting can always be a problem with that. But to not over-complicate things we can look for solutions on the case by case basis.

The examples above would be as follows:

<title>Godot Engine - {{ 'general.head.subtitle'|_ }}</title>
<a href="/features">{{ 'general.navigation.features'|_ }}</a>
<p>
  {{ 'home.introduction.p2'|_ }}
</p>

These should not be mistaken for model variables! We can use models with the translation feature, but that would require us to maintain those models for translations.

The .htm files would be less clear with this approach, obviously. This may not be a welcome solution because of that, though I'd argue that the point of CMS is to abstract the code/layout and the content, so using keys instead of strings is reasonable.


So what should I do? Is any of this acceptable or are the limitations of any of the approaches undesirable? Remi would probably be interested in giving his feedback on this /cc @akien-mga

akien-mga commented 4 years ago

I like the idea to internationalize the static content of the website (and indeed, I also agree that the news/devblogs should not be translated).

Thanks a lot for all the research @pycbouh, but I'm puzzled at how complex and non-standard this is.

We have a great, standardized translation system via Gettext (PO files), which we use for both software (Godot editor) and docs (Sphinx docs, XML classref), so ideally I'd want to reuse the same system for the website. Gettext solves the problem of having to figure out IDs by using the full string as msgid, and has good logic to handle edits/merges between similar strings so that translated content is not lost.

Can we use the RainLab Translate extension and add our own plugin to generate a POT file from the strings marked as localizable, and parse PO files to convert them in a format that the Translate extension can use? If in the process we need to generate arbitrary IDs for the Translation extension, that shouldn't be a problem.

The translation work should then be done on Weblate like for the rest of our translation resources, and synced manually in this repo. Translators should not need to use the CMS dashboard.

Otherwise, it seems that Weblate can also support YAML files, so we can try to work with the Translate format, even if their ID system seems really limited. https://docs.weblate.org/en/latest/formats.html

YuriSizov commented 4 years ago

I completely agree that we should try to unify all translation efforts and that the proposed solution based on what RainLab provides is all kinds of restrictive in that regard. Sadly, they don't support any other way of maintaining translations. However, we can implement our own solution on top of their work and use their system for the low-level logic while operating with regular PO files on our end.

Currently, I have made some progress in that regard, though the work is not complete. My solution introduces a custom made plugin specifically for our needs that utilizes PHP Gettext underneath. This is a PHP implementation of xgettext that comes with PO parser and generator which we can use. It also provides an extensible system for file scanners (i.e. translation message extractors) which allowed me to make an extractor for our Twig-based templates.

(Note: They have an official TwigScanner, but it's deeply WIP and has no release version yet; I've modeled my classes after their official implementation, but adjusted them to satisfy our particular needs)

So, how does it work now? We keep English texts in the templates as is, wrapping them with the same code as RainLab Translate requires, adding only one more function call to allow gettext to trigger on those texts with ease.

<title>Godot Engine - {{ TR('Free, open source and full of cookies 2D and 3D game engine')|_ }}</title>

TR being that function.

Code-wise this is it. Plugin does the rest and extracts the texts from the templates and combines them into PO files. October uses Laravel that has Artisan which allows us to make CLI commands executing PHP scripts. godotengine:extracti18n is such a script.

root@28925d7870fb:/var/www/html# php artisan godotengine:extracti18n
Starting message extraction.
Scanning Layout files.
Scanning Page files.
Scanning Partial files.
Merging extracted messages with the current base file.
Writing extracted messages to the base file.
Updating locale-specific files.
........ done.
Successfully updated "ru" translation; file themes/godotengine/i18n/po/messages.ru.po has been written.
Finished message extraction.

Internally it does most of the work via PHP, however PHP Gettext's merging functionality is bare-bones, so I make a shell call to msgmerge, which works as good as advertised by Remi. If we cannot make that call with the user that runs PHP on our servers, this step must be performed with a shell script. Please, be advised.

(Edit: Not that this needs to be run on a live server; that entirely depends on the synchronization process. Contributors can run this command locally.)

image

Localizations are stored with the rest of the theme, in a separate i18n folder, with another subfolder dedicated to the intermediate PO files. The command above generates the messages.po file, which is basically a PO template file, it only has a header and empty translation strings. It is then used with each individual locale file to make an up-to-date version of it. We only need one base file for all locale files this way.

After translators had their way with the localization and PO files were updated, a second command can be called to update the running CMS instance — godotengine:updatei18n.

root@28925d7870fb:/var/www/html# php artisan godotengine:updatei18n
Starting translation update.
Successfully updated "ru" translation; file themes/godotengine/i18n/ru.yaml has been written.
Updating translation database.
Flushing CMS cache.
Finished translation update.

This command reads through all localized PO files and generates corresponding YAML files that RainLab Translate supports. It then rescans them to update the internal database of that plugin and flushes the cache to make the translations instantly visible on the running instance.

To add a new language one would need to:

  1. Use admin panel to enable a language for RainLab Translate plugin (Translate > Manage Languages). This information is then used by both of my commands to only look for those language files.
  2. Add a YAML file reference in the theme.yaml:
    translate: 
    ru: i18n/ru.yaml
  3. Initially create a PO file and specify the Language header. I can probably make an another command for it, if needed.

Edit: I've added a new command that updates the necessary files automatically:

root@28925d7870fb:/var/www/html# php artisan godotengine:addi18n es
Adding "es" translation.
Writing translation PO file themes/godotengine/i18n/po/messages.es.po.
Writing translation YAML file themes/godotengine/i18n/es.yaml.
Updating Theme configuration.
Successfully added "es" translation.

Languages still need to be activated manually in the admin panel, as this operation is performed on the database and there is not dedicated API exposed for it. Still, this shouldn't be a problem, I think.


This is mostly working already. I still need to look into translating page titles, though.

And I still have concerns about string collisions. But maybe if two places use the same short string it should be always translated the same...

PS. If anyone is interested in the code behind this implementation I've pushed its current state here https://github.com/pycbouh/godot-website/commit/68d83c928a5aad06cb6ff6b29279bb6163883a35.

YuriSizov commented 1 year ago

This is now closed by https://github.com/godotengine/godot-website/pull/482. We'll be slowly providing new languages over the main visibility pages.