bobbingwide commented 3 years ago

Is your feature request related to a problem? Please describe. This Feature request contains a proposal for internationalising and localising Gutenberg's Full Site Editing templates and template parts directly from the HTML files.

Describe the solution you'd like WordPress is a multilingual CMS, used in countries all over the world where English is not the main language.

There is a requirement for

Full Site Editing themes to be Internationalized and Localised

This solution proposes a method whereby the template and template parts are localized statically and delivered from locale specific folders.

Current solution

In WordPress/Gutenberg there are two ways to indicate text that should be translated and to deliver translations to the end user. One is the PHP route, the other JavaScript. The general process followed by each is:

Internationalization
String extraction
Translation
Localization

Internationalization (i18n) is the manual process of indicating which strings are translatable. This is done by wrapping strings in particular functions.

$string = __( 'Check color', 'component' );
$non_translatable_string = 'Herb Miller';
$dynamic_string = sprintf( __( 'Written by: %1$s', 'component'),  $non_translatable_string );

Extraction is the automated process of preparing the strings for translation. Modern WordPress plugins use makepot and makejson to extract the strings.

Translation is performed offline, producing lookup tables of source language to target language translations. Note: The developer can provide context and hints that help the translator decide the localised version.

US English string	Target language string ( en_GB )	Note
Check color	Cheque colour	That’s one possible translation; no context or hint was given.
Written by: %1$s	Written by %1$s	For some reason the translator removed the colon.

Localization (l10n) is the process of delivering the translated version to the end user. Both PHP and Javascript code use the appropriate lookup table at run time.

Full Site Editing proposal

In Full Site Editing, templates and template parts are constructed as Gutenberg blocks and HTML. eg Extract from a template part file, written in US English.

<!-- wp:column {"width":"50%"} -->
    <div class="wp-block-column" style="flex-basis:50%"><!-- wp:heading -->
    <h2>Translatable</h2>
    <!-- /wp:heading -->

    <!-- wp:list -->
    <ul><li>Color</li><li>Center</li><li>Check</li><li>Internationalize</li><li>Localize</li><li>Aluminum</li></ul>
    <!-- /wp:list -->
<!-- /wp:column -->

We need new processes to handle these HTML files.

Assumptions

It is possible to work directly with HTML files.
There’s no need for template files to use PHP.
Any dynamic content delivery is performed by blocks, which are already internationalized.

Therefore:

Localization can be applied statically, creating locale specific versions for each template and template part.
Delivery of the locale specific content to the end user does not require runtime translation of the static text.

The new process would be:

Internationalization can be done by indicating which strings are not translatable.
Extraction can be implemented using a combination of the Gutenberg block parser, an HTML DOM parser ( DOMDocument ) and a .pot file writer.
Translation is unchanged.
Localization also uses a combination of the Gutenberg block parser and HTML parser along with the WordPress translate function, __(), and a Gutenberg block reformer.

Proposed solution

screenshot

I have developed a prototype to test with my experimental theme called Fizzie. The solution uses a number of routines, which are currently run in batch. For testing purposes it’s semi automated; run on demand.

Stage	Implemented by	Input	Output	Notes
.1. Extract strings	html2pot	block-templates & block-template-parts	theme.pot	output filenames could be suffixed
.2. Translate	l10n call bb_BB and la_CY	theme.pot	theme-bb_BB.po en_GB.po
.3. Msgfmt	called by l10n	theme-bb_BB.po theme-en_GB.po	theme-bb_BB.mo theme-en_GB.mo
.4. Localize	html2la_CY called for each traget locale	block-templates & block-template-parts & theme-la_CY.mo	template and template parts for each locale	See Changes to template loading logic for target directory structure

Running the routine generates two locale specific versions:

Bbboing files ( bb_BB ) – translation performed automatically by bb_BB.
UK English ( en_GB ) – translation performed automatically by la_CY, with a UK English lookup table.

The prototype is part of my oik-i18n plugin. See bobbingwide/oik-i18n/issues/7 – FSE – Can we internationalize .html files without requiring any special markup?

Testing the process

With the bbboing version, just about every word in each translatable string is partially obfuscated using a repeatable process. The target output is reasonably easy to recognise. Here’s a screen capture from my test template ( i18n-test.html ) used in the test page called “I18n test”.

Testing-i18n-test-page-with_bb_BB-locale

Changes to template loading logic

In order to test the results I needed to edit the code to load the templates and template files from the locale specific folders.

theme/
   block-template-parts/    
   block-templates
   languages/
      bb_BB/
         block-template-parts/
         block-templates/
      en_GB
         block-template-parts/
         block-templates/
      theme.pot
      theme-bb_BB.mo
      theme-bb_BB.po
      theme-en_GB.mo
      theme-en_GB.po

Note: The extracted and translated files are also in the languages folder, but take no part in the run time processing. In the final solution updating the .mo language file would trigger the localization process.

Since the logic to synchronize the Site Editor’s content with the template files is under going a lot of change at present, I only tested the logic to load template parts. I did this by updating my block override function called fizzie_render_block_core_template_part(). It ignores the synchronized content and loads the template part from the selected locale. It assumes the localized part exists.

$locale = get_locale();
if ( 'en_US' !== $locale) {
   $template_part_file_path = get_stylesheet_directory() . "/languages/$locale/block-template-parts/" . $slug . '.html';
} else {
   $template_part_file_path = get_stylesheet_directory() . '/block-template-parts/' . $slug . '.html';
}

What needs to be done?

This solution is not without its challenges. Take for instance this sample of rich text.

<p>Written by: <span translate="no">Herb Miller</span> using <code>Gutenberg</code>.</p>

What strings would you present to the translator? Would it be “Written by:”, “using” and “Gutenberg”? or the whole inner content of the paragraph?

One of the problems with granular string extraction is losing the context for the translation. Another is white space. Should the translator be given the chance to translate the whole of some rich text, rather than the snippets between tags?

In order to answer questions like this a number of activities will need to be performed, the first of which is to document the requirements. This should take into account each of the different target users, their languages and any other relevant cultural needs or customs.

Regarding implementation, there will be many areas affected:

Methods to enable internationalization will need to be implemented in Gutenberg's blocks:
- Identification of non-translatable strings in the HTML. eg A toolicon.
- Identification of translatable block attributes.
For build processes, WP-CLI’s i18n command will need to be extended to perform the string extraction and localization.
The extraction and localization routines will need to support the requirements translating rich text.
The localization process should operate in the WordPress back end, generating updated locale files for any new / updated .mo file.
Developer notes will be needed by developers writing themes for wordpress.org and/or other bespoke themes used in multilingual installations.

But before any of this is done, we should agree in principal the way forward with regard to i18n and l10n:

Static HTML templates and template parts
Dynamic templates and PHP
A combination of the above.

My preference is for 1.

References

Related Gutenberg issues:

https://github.com/wordpress/gutenberg/issues/21204 - How will translations be handled in block based themes?
https://github.com/WordPress/gutenberg/issues/21728 - Discuss: Contextual block behavior
https://github.com/wordpress/gutenberg/issues/21932 – Inline Dynamic Content Solutions

There are also issues that are more closely related to values which vary between sites:
URLs, post IDs, etc, rather than translatable text strings eg

https://github.com/wordpress/gutenberg/issues/20966 - Block Based Themes: Dynamic values in static HTML theme file

These are relevant only if we have to consider how to handle rich text content that includes links and inline images.

bobbingwide commented 3 years ago

This solution could also be applied to block patterns. It may help prevent the problem reported in https://core.trac.wordpress.org/ticket/51893 - Don’t split translatable strings in block templates.

vdwijngaert commented 3 years ago

No real feedback, but just throwing this here: I can imagine this having a major impact on phase 4 of Gutenberg in the long term roadmap. I agree i18n and i10n for templates and template parts is an issue to be tackled, but I'm not sure when and how...

bobbingwide commented 3 years ago

I can imagine this having a major impact on phase 4 of Gutenberg in the long term roadmap

It certainly will.

I'm not sure when and how...

I had assumed that the ability to translate a theme's content was already a pre-requisite to hosting the theme on wordpress.org. Now I see there's a tag #translation-ready and that none of the already 4 approved FSE themes are tagged with #translation-ready.

bobbingwide commented 2 years ago

Just over a year has passed since I wrote this proposal. Disappointed that no-one's attempted to review it.

Rather than enabling support for extracting and translating strings from HTML files and writing new HTML files for each required locale it would appear that the current method uses a convoluted process of implementing strings in patterns which are written in PHP. See Twenty Twenty Two for an example.

In my opinion patterns can also be written in translatable HTML. In order to provide the meta data currently implemented in the pattern's .php file each pattern could contain a special pattern meta block that's also translatable.

It would contain the translatable title and the non translatable categories and blockTypes.

Example

<!-- wp:pattern-meta { "categories": "query", "blockTypes": "core/query" }
<!-- Example query block pattern. -->
<!-- /wp:pattern-meta -->

The core/pattern-meta block would generate no output on the front end.

WordPress / gutenberg

Internationalization and localization: translating templates and template parts #27402

Full Site Editing themes to be Internationalized and Localised

Current solution

Full Site Editing proposal

Assumptions

Proposed solution

Testing the process

Changes to template loading logic

What needs to be done?

References