WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.52k stars 4.21k forks source link

Internationalization and localization: translating templates and template parts #27402

Open bobbingwide opened 3 years ago

bobbingwide commented 3 years ago

Is your feature request related to a problem? Please describe. This Feature request contains a proposal for internationalising and localising Gutenberg's Full Site Editing templates and template parts directly from the HTML files.

Describe the solution you'd like WordPress is a multilingual CMS, used in countries all over the world where English is not the main language.

There is a requirement for

Full Site Editing themes to be Internationalized and Localised

This solution proposes a method whereby the template and template parts are localized statically and delivered from locale specific folders.

Current solution

In WordPress/Gutenberg there are two ways to indicate text that should be translated and to deliver translations to the end user. One is the PHP route, the other JavaScript. The general process followed by each is:

  1. Internationalization
  2. String extraction
  3. Translation
  4. Localization

Internationalization (i18n) is the manual process of indicating which strings are translatable. This is done by wrapping strings in particular functions.

$string = __( 'Check color', 'component' );
$non_translatable_string = 'Herb Miller';
$dynamic_string = sprintf( __( 'Written by: %1$s', 'component'),  $non_translatable_string );

Extraction is the automated process of preparing the strings for translation. Modern WordPress plugins use makepot and makejson to extract the strings.

Translation is performed offline, producing lookup tables of source language to target language translations. Note: The developer can provide context and hints that help the translator decide the localised version.

US English string Target language string ( en_GB ) Note
Check color Cheque colour That’s one possible translation; no context or hint was given.
Written by: %1$s Written by %1$s For some reason the translator removed the colon.

Localization (l10n) is the process of delivering the translated version to the end user. Both PHP and Javascript code use the appropriate lookup table at run time.

Full Site Editing proposal

In Full Site Editing, templates and template parts are constructed as Gutenberg blocks and HTML. eg Extract from a template part file, written in US English.

<!-- wp:column {"width":"50%"} -->
    <div class="wp-block-column" style="flex-basis:50%"><!-- wp:heading -->
    <h2>Translatable</h2>
    <!-- /wp:heading -->

    <!-- wp:list -->
    <ul><li>Color</li><li>Center</li><li>Check</li><li>Internationalize</li><li>Localize</li><li>Aluminum</li></ul>
    <!-- /wp:list -->
<!-- /wp:column -->

We need new processes to handle these HTML files.

Assumptions

Therefore:

The new process would be:

Proposed solution

screenshot

I have developed a prototype to test with my experimental theme called Fizzie. The solution uses a number of routines, which are currently run in batch. For testing purposes it’s semi automated; run on demand.

Stage Implemented by Input Output Notes
.1. Extract strings html2pot block-templates & block-template-parts theme.pot output filenames could be suffixed
.2. Translate l10n call bb_BB and la_CY theme.pot theme-bb_BB.po en_GB.po
.3. Msgfmt called by l10n theme-bb_BB.po theme-en_GB.po theme-bb_BB.mo theme-en_GB.mo
.4. Localize html2la_CY called for each traget locale block-templates & block-template-parts & theme-la_CY.mo template and template parts for each locale See Changes to template loading logic for target directory structure

Running the routine generates two locale specific versions:

  1. Bbboing files ( bb_BB ) – translation performed automatically by bb_BB.
  2. UK English ( en_GB ) – translation performed automatically by la_CY, with a UK English lookup table.

The prototype is part of my oik-i18n plugin. See bobbingwide/oik-i18n/issues/7 – FSE – Can we internationalize .html files without requiring any special markup?

Testing the process

With the bbboing version, just about every word in each translatable string is partially obfuscated using a repeatable process. The target output is reasonably easy to recognise. Here’s a screen capture from my test template ( i18n-test.html ) used in the test page called “I18n test”.

Testing-i18n-test-page-with_bb_BB-locale

Changes to template loading logic

In order to test the results I needed to edit the code to load the templates and template files from the locale specific folders.

theme/
   block-template-parts/    
   block-templates
   languages/
      bb_BB/
         block-template-parts/
         block-templates/
      en_GB
         block-template-parts/
         block-templates/
      theme.pot
      theme-bb_BB.mo
      theme-bb_BB.po
      theme-en_GB.mo
      theme-en_GB.po 

Note: The extracted and translated files are also in the languages folder, but take no part in the run time processing. In the final solution updating the .mo language file would trigger the localization process.

Since the logic to synchronize the Site Editor’s content with the template files is under going a lot of change at present, I only tested the logic to load template parts. I did this by updating my block override function called fizzie_render_block_core_template_part(). It ignores the synchronized content and loads the template part from the selected locale. It assumes the localized part exists.

$locale = get_locale();
if ( 'en_US' !== $locale) {
   $template_part_file_path = get_stylesheet_directory() . "/languages/$locale/block-template-parts/" . $slug . '.html';
} else {
   $template_part_file_path = get_stylesheet_directory() . '/block-template-parts/' . $slug . '.html';
}

What needs to be done?

This solution is not without its challenges. Take for instance this sample of rich text.

<p>Written by: <span translate="no">Herb Miller</span> using <code>Gutenberg</code>.</p>

What strings would you present to the translator? Would it be “Written by:”, “using” and “Gutenberg”? or the whole inner content of the paragraph?

One of the problems with granular string extraction is losing the context for the translation. Another is white space. Should the translator be given the chance to translate the whole of some rich text, rather than the snippets between tags?

In order to answer questions like this a number of activities will need to be performed, the first of which is to document the requirements. This should take into account each of the different target users, their languages and any other relevant cultural needs or customs.

Regarding implementation, there will be many areas affected:

But before any of this is done, we should agree in principal the way forward with regard to i18n and l10n:

  1. Static HTML templates and template parts
  2. Dynamic templates and PHP
  3. A combination of the above.

My preference is for 1.

References

Related Gutenberg issues:

There are also issues that are more closely related to values which vary between sites:
URLs, post IDs, etc, rather than translatable text strings eg

These are relevant only if we have to consider how to handle rich text content that includes links and inline images.

bobbingwide commented 3 years ago

This solution could also be applied to block patterns. It may help prevent the problem reported in https://core.trac.wordpress.org/ticket/51893 - Don’t split translatable strings in block templates.

vdwijngaert commented 3 years ago

No real feedback, but just throwing this here: I can imagine this having a major impact on phase 4 of Gutenberg in the long term roadmap. I agree i18n and i10n for templates and template parts is an issue to be tackled, but I'm not sure when and how...

bobbingwide commented 3 years ago

I can imagine this having a major impact on phase 4 of Gutenberg in the long term roadmap

It certainly will.

I'm not sure when and how...

I had assumed that the ability to translate a theme's content was already a pre-requisite to hosting the theme on wordpress.org. Now I see there's a tag #translation-ready and that none of the already 4 approved FSE themes are tagged with #translation-ready.

bobbingwide commented 2 years ago

Just over a year has passed since I wrote this proposal. Disappointed that no-one's attempted to review it.

Rather than enabling support for extracting and translating strings from HTML files and writing new HTML files for each required locale it would appear that the current method uses a convoluted process of implementing strings in patterns which are written in PHP. See Twenty Twenty Two for an example.

In my opinion patterns can also be written in translatable HTML. In order to provide the meta data currently implemented in the pattern's .php file each pattern could contain a special pattern meta block that's also translatable.

It would contain the translatable title and the non translatable categories and blockTypes.

Example

<!-- wp:pattern-meta { "categories": "query", "blockTypes": "core/query" }
<!-- Example query block pattern. -->
<!-- /wp:pattern-meta -->

The core/pattern-meta block would generate no output on the front end.