Improve translatability of template and template part rich text HTML

bobbingwide commented 3 years ago

In issue #7 I demonstrated that it's possible to extract text from the Full Site Editing template and template part .html files. But we noted that this solution suffered from the same problem as for PHP code in that translators didn't get full sentences to translate.

<p>Thou hasn't seen <br> nothing yet</p>

I wrote about the challenges of translating rich text content in Localization of Full Site Editing themes.

Now I want to see if I can implement some of the proposals in that post.

The automatic translations to UK English ( en_GB ) and bbboing ( bb_BB ) work because the translation process doesn’t attempt to make any sense of the content to be translated.

In my opinion, before we can finalize any solution for localizing the HTML we’ll have to agree some ground rules for internationalizing and extraction. The main problem areas that I have considered are:

Sentences broken by inner tags.
Assumptions associated with respecting leading and trailing blanks.
HTML tag’s attributes.
Text that should not be translated.
Gutenberg block’s text attributes that should be translated
Providing contextual help.

Items 4., 5., and 6. can be supported by providing special tools in the block editor.

Multi Lingual Support is Phase 4 of Gutenberg, so we can't realistically expect Gutenberg to provide an environment that can be used by translators in the short term. The best we can do is to improve the extraction, translation and localization processes. giving translators the opportunity to alter markup when it makes sense to do so.

Note: Google’s automatic web page translator handles inner tags. It may not produce the best translation, but it certainly is easy to use. If we extract the translatable text in sensible sized chunks we could easily make use of Google's translation service to give the human translators a head start.

Requirements

[x] Extract rich text to retain as much context as possible.
[x] Allow translators a certain amount of free rein with regards to the sequence of nested HTML tags.
[x] Automatically apply the translations to produce the locale specific versions of each template and template part.
[x] Do not depend on logic to respect whitespace in the original text.
[x] No need to prevent the translator from seeing text marked as translate="no".
[ ] Do prevent the translator from translating Gutenberg block attributes marked as non-translatable.

Optionally,

[ ] Support automatic translation of untranslated text using Google's Cloud translation service.

bobbingwide commented 3 years ago

Proposal for extracting rich text.

- for each outer rich text tag found
   if it has inner tags 
      extract text using the rich text route
  else 
      extract translatable attributes and inner tags recursively (current solution )

rich text route - extract

-  copy tag and inner tags to new DOMdocument
-  save as HTML
-  strip outer tag ( and attributes )
-  add as the string to be translated

Proposal for localization

- for each outer rich text  tag 
- if it has inner tags
     apply translations using the rich text route
  else
    apply as per current solution

rich text route - localize

- convert translation to DOMdocument
- replace existing inner nodes with translated content

Q. Should we use the rich text route for each translation?

bobbingwide commented 3 years ago

Couple of things to fix.

Need to add strong to the list of acceptable rich text tags.
Need to trim strings to be translated.
Need to remove carriage returns ( \r) and line feeds (\n ) from strings to be translated.
( the biggy ) translation of rich text in list items was stopping after one item with rich text was translated.

The last problem was satisfied by changing the for loop. From

foreach ( $node->childNodes as $child_node ) {
   ...
   $this->extract_strings( $child_node );
}

To

for ( $currentNode = 0; $currentNode < $node->childNodes->length; $currentNode++ ) {
   ...
   $this->extract_strings( $node->childNodes[$currentNode] );
}

It seems that the replaceChild method performed in DOM_string_updater:::replace_node() messed up the current position in the foreach loop.

bobbingwide / oik-i18n

Improve translatability of template and template part rich text HTML #9

Requirements