bobbingwide / oik-i18n

Internationalization for the oik suite of plugins
GNU General Public License v2.0
0 stars 0 forks source link

FSE - Can we internationalize .html files without requiring any special markup? #7

Open bobbingwide opened 3 years ago

bobbingwide commented 3 years ago

In Gutenberg Full Site Editing (FSE) the current proposal is to deliver a theme's template and template parts as .html files. But there's a problem.

Question. How does one go about internationalizing and localizing HTML?

This leads to further questions.

I've not looked at all of the Gutenberg issues related to this. I just want to experiment with a (simple) proposal that involves:

Proposed solution

Stage 1. Extract translatable strings from HTML

Using PHP's DOMDocument and associated classes and methods it should be possible to identify all translatable strings in any HTML and extract them into a .pot file format.

Stage 2. Extract strings from theme's templates and template parts to a .pot file

Using Gutenberg's block parser ( class WP_Block_Parser ) we can parse blocks and pass the innerHTML to the HTML string extraction routine and a extract translatable attributes from attrs.

Process all templates and template parts to produce a single theme .pot file: theme-FSE.pot Note: This will be a different file from the .pot file created by parsing the PHP files.

Stage 3. Translate into local language

Use a similar solution to l10n generate en_GB and bb_BB .po and .mo files.

Stage 4. Apply the local language

Using the .mo files generated after translation apply the translations to the templates and template parts saving the new files in a language specific directory.

Load the target text domain for the theme.
For each template or template part
   For each block with innerHTML
     Parse the HTML to find the strings
       Lookup the string in the table 
       Apply the string
      return the new HTML
   rebuild the block
Rebuild the template  or template part

Stage 5. Load the templates and template parts for the user's locale

This is where we'll have to change Gutenberg's template and template part loading logic.

If there are language files for the theme and the user's locale use these when loading .html files.

Assumptions

Scope and Exclusions

bobbingwide commented 3 years ago

I'm going to try using the simple HTML dom parser from https://sourceforge.net/projects/simplehtmldom/files/latest/download Perhaps an easier approach would be to write it in JavaScript using https://www.npmjs.com/package/html-dom-parser

Note: I wrote this after I'd failed to get the DOMDocument parser to find certain tags. I've since realised my mistake and have reverted to using DOMDocument. See https://github.com/bobbingwide/oik-i18n/issues/7#issuecomment-734784165

bobbingwide commented 3 years ago

I found some very useful documentation about HTML's translate attribute.

https://www.w3.org/International/questions/qa-translate-flag.en

Basically, for any text you don't want translated you wrap it in an element with translate="no" and if you want to explicitely identify something to be translated it's wrapped in an element with translate="yes".

There are quite a few other details in this document and links. Something for another time.

bobbingwide commented 3 years ago

Prior to creating this issue I briefy played with PHP's DOMDocument class. After a while I realised it only worked for valid XML, not HTML; Self closing tags such as and
were being ignored.

I tried the simple_html_dom routine but it couldn't nicely handle:

So I revisited the DOMDocument route as other plugins were happily using it. e.g. Jetpack ( class.jetpack-post-images.php ).

I realised I wasn't handling the nodes correctly; I was missing else logic for when the $node->nodeValue was empty. So now I'm going to revert to using DOMDocument.

I'll have to cater for Warnings when DOMDocument encounters tags it can't handle. Extract from Jetpack's code.

// The @ is not enough to suppress errors when dealing with libxml,
// we have to tell it directly how we want to handle errors.
libxml_use_internal_errors( true );
@$dom_doc->loadHTML( $html_info['html'] );
libxml_use_internal_errors( false );
bobbingwide commented 3 years ago

I've not looked at all of the Gutenberg issues related to this.

Actually, I have had a cursory glance.

I won't reference the issues directly until I've made enough progress in Stage 4. Here are some of the relevant issue numbers:

20966 - Block Based Themes: Dynamic values in static HTML theme file

This is more to do with values which vary between sites: URLs, post IDs etc than text strings

21204 - How will translations be handled in block based themes?

21728 - Discuss: Contextual block behavior

21932 - Inline Dynamic Content Solutions

I think in most cases the solution is being overthought.

In my view there are two distinct challenges:

  1. Text which needs to be translated.
  2. Hardcoded values which need to be generalized.

For this work I'm only concerned with the i18n/l10n part.

My premises are:

bobbingwide commented 3 years ago

Stage 5. Load the templates and template parts for the user's locale.

We can implement a local solution for the Fizzie theme that doesn't require changes to Gutenberg with the following assumptions.

The local solution can be implemented in fizzie_load_template_part().

bobbingwide commented 3 years ago

See also https://herbmiller.me/localization-of-full-site-editing-themes/

bobbingwide commented 1 year ago

While updating Fizzie for WordPress 6.2 and Gutenberg 15.3.1 I briefly considered how much of the internationalization logic still worked. It seems that t10n.bat needs updating to reflect the fact that I've moved the files block-template-parts to parts and block-templates to templates.