GibbonEdu / core

Gibbon is a flexible, open source school management platform designed to make life better for teachers, students, parents and leaders.
https://gibbonedu.org
GNU General Public License v3.0
482 stars 306 forks source link

Improve translation API by named parameter replacement #701

Closed yookoala closed 5 years ago

yookoala commented 5 years ago

This is a comment on the general design of translation API __() and Gibbon\Locale::translate(). The changes needed will have to be a long term one. So I'm just raising this issue here for discussion.

Current Situation

Translations are done through the __() translation function in functions.php. It is undergoing a transition to remove $guid. Before, it expects input like this:

/**
 * @param string $guid GUID of Gibbon installation
 * @param string $text Text to translate
 * @param string (Optional) Text domain. Unique identifier for retrieving translated strings. Default value: null
 *
 * @return  string Translated Text
 */
function __($guid, $text, $domain=null);

After the transition, it'll expect something like this:

/**
 * @param string $text Text to translate
 * @param string (Optional) Text domain. Unique identifier for retrieving translated strings. Default value: null
 *
 * @return  string Translated Text
 */
function __($text, $domain=null);

The underlying translation is done by the global $gibbon->locale object, which is an instance of Gibbon\Locale. The actual translation method looks like this:

/**
 * Custom translation function to allow custom string replacement
 *
 * @param   string  Text to Translate
 * @param   boolean Use guid.
 *
 * @return  string  Translated Text
 */
public function translate($text, $domain = null);

Problems of the Approach

In actual world, strings are not always translated as is. There will be parameters within a string for translation. And right now, the API does not handle them at all. Programmers would deal with it like this:

echo __('View') . ' ' . $course['course'] . ' ' . $course['class'] . ' ' . __('Marks');

or

echo __('Posted') . ' ' . $time['minutes'] . __('minutes') .  ' ' . __('ago');

There are 2 problems:

  1. string concat is hard to read and easily get wrong; and more importantly
  2. string order is fixed without context. (Which is not ideal for translating to some languages)

In particular to problem (2), we can take a look again the 2nd string above. "Posted 10 minutes ago", when translated to Chinese would ​most reasonably be "在 10 分鐘之前發出". Please note that "在 ... 之前" means "ago", "發出" means "posted". Also there is no space between any of the words. So if you read carefully, the word order is totally different from English. There is no way the above code is going to translate to reasonable Chinese.

Existing Solution to the Problems

Drupal has a nice translation function t() that has a function signature like this:

Drupal's t() function

/**
 * Translates a string to the current language or to a given language.
 *
 * @param $string A string containing the English string to translate.
 * @param $args  An associative array of replacements to make after translation. Based
 *     on the first character of the key, the value is escaped and/or themed.
 *     See format_string() for details.
 * @param $options An associative array of additional options, with the following elements:
 *   - 'langcode' (defaults to the current language): The language code to
 *     translate to a language other than what is used to display the page.
 *   - 'context' (defaults to the empty context): A string giving the context
 *     that the source string belongs to. See @ref sec_context above for more
 *     information.
 *
 * @return The translated string.
 */
function t($string, array $args = array(), array $options = array());

It supports both named parameter and context (i.e. domain):

echo t('Posted @minutes minutes ago', ['@minutes' => $time['minutes']]); // Posted 10 minutes ago

gettext/gettext on Packagist

Even better, "gettext/gettext" on packagist also support named parameter:

use Gettext\GettextTranslator;

//Create the translator instance
$t = new GettextTranslator();

//Set the language and load the domain
$t->setLanguage('gl');
$t->loadDomain('messages', 'Locale');

//Use it:
echo $t->gettext('apple'); // "Mazá"

//Or use the gettext functions
echo gettext('apple'); // "Mazá"

//If you want use the global functions
$t->register();

echo __('apple'); // "Mazá"

//And use sprintf/strtr placeholders
echo __('Hello %s', 'world'); //Hello world
echo __('Hello {name}', ['{name}' => 'world']); //Hello world
echo __('Posted {minutes} minutes ago', ['{minutes}' => $time['minutes']]); // Posted 10 minutes ago

Proposal

Either to:

  1. Adopt existing translation function that support named parameter. Then migrate all our translations to that framework; or
  2. update the current __() and Gibbon\Locale::translate() to properly support the function.
crayner commented 5 years ago

I agree with this approach. Symfony, which is used as the core for Drupal, uses this proposed functionality as well, but adds a second call to manage pluralisation.

public function trans($id, array $parameters = array(), $domain = null, $locale = null)

public function transChoice($id, $number, array $parameters = array(), $domain = null, $locale = null)

Both these methods translate a string ($id) and replace preset associated parameters ($parameters). The format of the translation allows for pluralisation based on the number ($number) supplied. Full details can be found at https://symfony.com/doc/current/translation.html As per ALL Symfony modules, these hold a MIT licence, and can be used independently of Symfony. Why bog yourself in framework code when you have such elegant solutions available?

yookoala commented 5 years ago

Spent some times in studying text extraction with xgettext. Turns out it is very flexible.

Here is the gist I made to demonstrate the flexibility. https://gist.github.com/yookoala/64c6960f2c08dbf527481aae32391312

So it has very good support in extracting whatever function's parameters as singular, plural string and context (domain).

SKuipers commented 5 years ago

Thanks for putting this together Koala! It's very well thought out and researched.

Considerations

Backwards compatibility

Gibbon currently has 6324 translatable terms, and lots of __()s. Our translation function needs to support the following:

__(string $text)
__(string $text, string $domain)

__(string $guid, string $text)
__(string $guid, string $text, string $domain)

Certainly, $guid is on the way out! But it's a big codebase, so until then we can’t make breaking changes. Any changes to __() should:

Handling text domains

I do think we have room to change the $domain as the second parameter. I think when it was added it was likely based off of the Wordpress equivalent: https://developer.wordpress.org/reference/functions/__/

Right now, it looks like $domain as the second parameter is currently only used in Free Learning and Data Admin. However, it has been published in the Module Development docs since v13 and should be supported, for at least a couple versions so it can be transitioned to a different method signature.

String replacements

Gibbon’s __() function handles gettext translation as well as the string replacement functionality. The Gibbon\Locale class currently does this by loading them from the database, caching them in the session, and applying them with str_replace. Any changes need to account for this functionality.

Translation parameters

I agree, using named parameters will be more useful to translators, and this is certainly the way modern frameworks now handle translation.

And right now, the API does not handle them at all.

Well, there are parameters in many of the translation strings to handle this, however they’re in the printf format of %s and %1$s. I think we can introduce named parameters, but will also have to keep these existing strings going forward too.

Comparisons

gettext\gettext

https://github.com/oscarotero/Gettext

I like that the gettext\gettext library is lightweight and straight forward, without additional dependencies or framework-specific interfaces. Given that translation functions are called many times per page, I think keeping the complexity of these methods down is important.

However, I’m not fond of the number of different functions it uses, and their particular implementation of $domain as the first parameter for domain-related methods. Other frameworks (symfony, laravel, wordpress, drupal, cake, etc) always keep the translatable string first, which I think is important for readability. If we did look into using this library, we would likely want to keep our own function signature, and just use the class methods of the library.

symfony\translation

https://github.com/symfony/translation

It’s a pretty big library, that does many different things to solve many different problems. I’ve done some benchmarks to compare the performance and I’m seeing a 200ms increase on page loads using this library. For most pages, this is a 2-3x increase. Digging deeper into how it works, this difference appears to be in how the symfony Loader classes work, and for .mo files it is reading & parsing the whole file for each page load. It looks like it works this way because it doesn’t have direct gettext support. I see where it’s a very versatile library for many other uses, but I think in this particular case because Gibbon uses gettext for translation, the built-in gettext functions have better performance.

What I do like is the way these modern frameworks are moving towards a single interface method/function __(), t() and trans() vs having many optional methods (gettext\gettext in comparison has eight variations on __() for domain, context and pluralization).

Drupal’s t() function

Their function signature uses an $options array as the third parameter, which makes a lot of sense. This handles the added context, locale and domain options without needing either different functions for each (gettext) or additional parameters (symfony).

Thoughts

Some thoughts on a possible updated function signature: __(string $original, array $args = [], array $options = [])

and a possible new pluralization function: __n(string $singular, string $plural, $value, array $args = [], array $options = [])

(gettext’s n__ as a function name looks a bit wonky to me, vs __n)

 

What do you think?

yookoala commented 5 years ago

It's the end of the v17 cycle. I agree that we shouldn't do any drastic changes to core right now. Not unless it is 100% backward compatible. And the new __() and __n() approach seems reasonable.

yookoala commented 5 years ago

Question: Do we need specific format for the named replacement keys?

For example,

PDO placeholder style (basically replace as-is)

__(`'Hello :name (:link)'`, [
  ':name' => htmlspecialchars('Alice'),
  ':link' => '<a href="https://twitter.com/alice">link</a>',
])

or

Quoted template variable style,

__(`'Hello ${name} (${link})'`, [
  'name' => htmlspecialchars('Alice'),
  'link' => '<a href="https://twitter.com/alice">link</a>',
])

or

Drupal style annotations (with extra handling to the variables),

__(`'Hello @name (!link)'`, [
  '@name' => 'Alice', // will be processed with htmlspecialchars to remove.
  '!link' => '<a href="https://twitter.com/alice">link</a>', // will not be processed.
])
SKuipers commented 5 years ago

Good question. I'm in favor of using something that wraps the placeholder like { }s. It helps with strtr in cases where the string might show up as part of a series of characters, and I think its easier to read. That's just my 2 cents. What do you think? also @rossdotparker, any preference?

rossdotparker commented 5 years ago

I am happy to leave it up to you two in this case. Thanks for all the work you are doing here.

yookoala commented 5 years ago

To be honest, I'm more used to the Drupal style. I love it for providing extra string sanitization that we'd usually use (i.e. htmlspecialchar).

I'd like to know what @crayner think on this subject :-)

crayner commented 5 years ago

I don't use any particular style for placeholders, but use a variety of formats, as it is a simple stringreplacement, so stufff like %xxxxx% {xxxxx} or combinations are all valid %{xxxxxx}

I do use ONE defined placeholder: 'transChoice' (camel case word that is always an integer) and generates both a placeholder and a plurisation tool. The code I use is found in the Symfony translations library, and allows on a single line multiple definitions handling. If you use the Symfony code it expects a placeholder %count% to hold this number. e.g. "{0} There is no product|{1} There is one product|]1,19] There are %count% products|[20,Inf[ There are many products" So, jsut some ideas, on taking this to fuller conclusion. Need to go, but will find some code for you later...

yookoala commented 5 years ago

Let's work with "Hello {placeholder}" style now :-)

SKuipers commented 5 years ago

I've merged in #709 🎉 Thanks!!