Open bobbingwide opened 3 years ago
I'm going to try using the simple HTML dom parser from https://sourceforge.net/projects/simplehtmldom/files/latest/download Perhaps an easier approach would be to write it in JavaScript using https://www.npmjs.com/package/html-dom-parser
Note: I wrote this after I'd failed to get the DOMDocument parser to find certain tags. I've since realised my mistake and have reverted to using DOMDocument. See https://github.com/bobbingwide/oik-i18n/issues/7#issuecomment-734784165
I found some very useful documentation about HTML's translate
attribute.
https://www.w3.org/International/questions/qa-translate-flag.en
Basically, for any text you don't want translated you wrap it in an element with translate="no"
and if you want to explicitely identify something to be translated it's wrapped in an element with translate="yes"
.
There are quite a few other details in this document and links. Something for another time.
Prior to creating this issue I briefy played with PHP's DOMDocument class. After a while I realised it only worked for valid XML, not HTML; Self closing tags such as and
were being ignored.
I tried the simple_html_dom routine but it couldn't nicely handle:
So I revisited the DOMDocument route as other plugins were happily using it. e.g. Jetpack ( class.jetpack-post-images.php
).
I realised I wasn't handling the nodes correctly; I was missing else logic for when the $node->nodeValue
was empty.
So now I'm going to revert to using DOMDocument.
I'll have to cater for Warnings when DOMDocument encounters tags it can't handle. Extract from Jetpack's code.
// The @ is not enough to suppress errors when dealing with libxml,
// we have to tell it directly how we want to handle errors.
libxml_use_internal_errors( true );
@$dom_doc->loadHTML( $html_info['html'] );
libxml_use_internal_errors( false );
I've not looked at all of the Gutenberg issues related to this.
Actually, I have had a cursory glance.
I won't reference the issues directly until I've made enough progress in Stage 4. Here are some of the relevant issue numbers:
20966 - Block Based Themes: Dynamic values in static HTML theme file
This is more to do with values which vary between sites: URLs, post IDs etc than text strings
21204 - How will translations be handled in block based themes?
21728 - Discuss: Contextual block behavior
21932 - Inline Dynamic Content Solutions
I think in most cases the solution is being overthought.
In my view there are two distinct challenges:
For this work I'm only concerned with the i18n/l10n part.
My premises are:
translate="no"
attributes.Stage 5. Load the templates and template parts for the user's locale.
We can implement a local solution for the Fizzie theme that doesn't require changes to Gutenberg with the following assumptions.
block-template
files only contain non-translatable contentblock-template-parts
files contain translatable content which can be translated at the lowest node level..pot
file name clash.The local solution can be implemented in fizzie_load_template_part()
.
While updating Fizzie for WordPress 6.2 and Gutenberg 15.3.1 I briefly considered how much of the internationalization logic still worked. It seems that t10n.bat
needs updating to reflect the fact that I've moved the files block-template-parts
to parts
and block-templates
to templates
.
In Gutenberg Full Site Editing (FSE) the current proposal is to deliver a theme's template and template parts as
.html
files. But there's a problem.Question. How does one go about internationalizing and localizing HTML?
This leads to further questions.
I've not looked at all of the Gutenberg issues related to this. I just want to experiment with a (simple) proposal that involves:
Proposed solution
.html
file..html
files to a.pot
file.Stage 1. Extract translatable strings from HTML
Using PHP's DOMDocument and associated classes and methods it should be possible to identify all translatable strings in any HTML and extract them into a .pot file format.
Stage 2. Extract strings from theme's templates and template parts to a .pot file
Using Gutenberg's block parser ( class
WP_Block_Parser
) we can parse blocks and pass the innerHTML to the HTML string extraction routine and a extract translatable attributes fromattrs
.Process all templates and template parts to produce a single theme
.pot
file: theme-FSE.pot Note: This will be a different file from the.pot
file created by parsing the PHP files.Stage 3. Translate into local language
Use a similar solution to
l10n
generateen_GB
andbb_BB
.po
and.mo
files.Stage 4. Apply the local language
Using the .mo files generated after translation apply the translations to the templates and template parts saving the new files in a language specific directory.
Stage 5. Load the templates and template parts for the user's locale
This is where we'll have to change Gutenberg's template and template part loading logic.
If there are language files for the theme and the user's locale use these when loading
.html
files.Assumptions
translate="no"
.Scope and Exclusions
script
andstyle
, are not expected so will not be processed.