expect-digital / translate

Translation for distributed systems
0 stars 0 forks source link

Handle placeholders #45

Open janishorsts opened 8 months ago

janishorsts commented 8 months ago

This ticket is for discussion on how to handle placeholders and convert them to and from Message Format 2.

The primary goal is:

VladislavsPerkanuks commented 8 months ago

So two ways that I see:

All ways assume, that we will have Executor e.g. Formatter, which will be able to format message with given variables (mf2 string -> formatted string). Which is not implemented yet. Otherwise I see no point of using mf2.

Way 1

Continue, the existing way. All variables are extracted from source text and put into local declaration.

Javascript example:

Hello {{ name }}! // source text
.local $name = { |{{ name }}| }\n {{Hello { $name }!}} // mf2 text

I prefer this way because:

  1. It provides way to get either unformatted text (source text), either formatted text
  2. In some formats it can store metadata about message itself.

Example of 1:

// msg = .local $name = { |{{ name }}| }\n {{Hello { $name }!}}

// Back to source formatting (use local declaration values)
msg.Format() // Hello {{ name }}! 

// Formatting with given variables (override local declaration with new values)
msg.Format("name", "John") // Hello John! 

Example of 2:

// input po
/*
#, python-format
msgid "Hello %(name)s!"
msgstr ""
*/

// Flag is important, because it tells that this message contains placeholders.
// But in our model.Message we cannot store it anywhere. Resulting in crucial information loss.
// But when we store as local declaration, we do not lose it.

// mf2
/*
.local $format = { python-format }
.local $name = { |%(name)s| }
{{Hello { $name }!}}
*/

// And now we can safely return it back to po format
// output po
/*
#, python-format
msgid "Hello %(name)s!"
msgstr ""
*/

Still I think it is abusing MF2 syntax, and making it do things it was not designed for...

Way 2

Revert, to storing variables in function options.

Javascript example:

Hello {{ name }}! // source text

Hello { $name :fmt format=|{{ }}| }! // mf2 text
// or
Hello { :fmt format=|{{ name }}| }! // mf2 text
// or any other similar option

Why I do not like this way:

  1. It contradicts with mf2, as "Functions are used to evaluate, format, select, or otherwise process data values during formatting".
  2. No way to store metadata about message itself, as in previous way

More details about 1. Formatting is when we convert mf2 string, to plain translated string, as was demonstrated in previous way example 1. But in our case, we need to format it back to the same string as it was when extracting, e.g. same placeholders if any. Which is not MF2 task!

That means

// msg = Hello { $name :fmt format=|{{ }}| }!

// theoretically this should error, because variable name was not provided
msg.Format() 

// here means, that "John" before adding it to resulting string should be
// formatted with function fmt, with option format=|{{ }}|, which is not what we want.
msg.Format("name", "John") 

Note

At this point, I am not sure, if we are using MF2 correctly, or not, but I am sure that we dug ourselves into a hole, and need to think about it before continuing.

janishorsts commented 8 months ago

WIP.

Named formatting

Using custom function fmt to store details of the value from the original source code.

const Intl = require('messageformat');
const { string } = require('messageformat/functions');

const locale = 'en-US';
const msg = `.input { $name :fmt lang=python variant=|%s| }

.match { $count :integer }
1 {{Hello, { $name }! You got 1 apple }}
* {{Hello, { $name }! You got { $count } apples }}`;

const mf = new Intl.MessageFormat(msg, locale, {
  functions: {
    "fmt": (context, options, input) => {
      console.log(options)

      return string(context, options, `${input} ${o}`)
    }
  },
});

// [Object: null prototype] { lang: 'python', variant: '%s' }
// Hello, Dave! You got 1 apple 
console.log(mf.format({name:'Dave', count: 1}))

// [Object: null prototype] { lang: 'python', variant: '%s' }
// Hello, Lisa! You got 2 apples
console.log(mf.format({name:'Lisa', count: 2}))
janishorsts commented 8 months ago

We use count for plural in Message Format 2.

However, there is a risk that count is used as a named formatted variant in the original source code.

Add namespace, and rename it to n. E.g. ota:n. We should namespace all variables that refer to the translate agent variables.

.match { $ota:n :integer }
1 {{Hello, { $name }! You got 1 apple }}
* {{Hello, { $name }! You got { $ota:n } apples }}`;
janishorsts commented 8 months ago

Example from https://docs.oasis-open.org/xliff/v1.2/xliff-profile-html/xliff-profile-html-1.2-cd02.html#General_EntityReferences

Example 1

<h1>Online Help for &ProductName;</h1>
<source>Online Help for <ph id='1'>&amp;ProductName;</ph>.</source>
Online Help for {{ $ProductName :fmt style=|% ;| }}

-- OR --

Online Help for {{ $ProductName :fmt prefix=|%| suffix=|;| }}

Example 2

Nested HTML code

<p title='Information about Mount Hood'>This is Mount Hood: <img src="mthood.jpg" alt="Mount Hood with its snow-covered top"></p>
<ph id="a_2">
  <sub ctype="x-html-p-title">Information about Mount Hood</sub>
</ph>This is Mount Hood:<ph id="a_3" ctype="x-html-img" xhtml:src="mthood.jpg">
  <sub ctype="x-html-img-alt">Mount Hood with its snow-covered top
  </sub>
</ph>

I would say we do NOT support this, initially. These days, translation libraries work differently. The above example is SO FRAGILE, any change to HTML requires a corresponding fix in all translations.

E.g in the instance of Angular app. It would create three text copies to be translated:

<!-- older angular -->
<p [title]="'Information about Mount Hood' | translate" translate>This is Mount Hood: <img src="mthood.jpg" [alt]="'Mount Hood with its snow-covered top' | translate"></p>

<!-- latest angular -->
<p title="Information about Mount Hood" i18n-title i18n>This is Mount Hood: <img src="mthood.jpg" alt="Mount Hood with its snow-covered top" i18n-alt></p>
janishorsts commented 8 months ago

The latest Angular uses ICU a lot now.

VladislavsPerkanuks commented 8 months ago

Depending on the presence or absence of a variable or literal operand and a function, private-use annotation, or reserved annotation, the resolved value of the expression is determined as follows: If the expression contains a reserved annotation, an Unsupported Expression error is emitted and a fallback value is used as the resolved value of the expression. Else, if the expression contains a private-use annotation, its resolved value is defined according to the implementation's specification.

That means that we could leverage the private-use annotation to store the original value of the variable. E.g. Implementation of Formatter:

  1. If input is given use that to resolve expression
  2. If input is not given, use the value from private-use annotation to resolve expression

Example:

mf2 := NewMF2(`
.input = { $placeholder ^original=%s}
.input = { $name ^original=\{\{ name \}\}}
{{Hello { $placeholder } { $name }}}
`)

// Treat not existing keys in input map as a signal to use the original value
mf2.Format() // Result: Hello %s {{ name}}

// Use input map to resolve one expression
mf2.Format(map[string]string{"name": "John"}) // Result: Hello %s John

// Use input map to resolve both expressions
mf2.Format(map[string]string{"placeholder": "World", "name": "John"}) // Result: Hello World John

Example when converting from xliff:


// source: <source>Online Help for <ph id='1'>&amp;ProductName;</ph>.</source>
// mf2
. input = { $ph1 ^original=&ProductName }
{{Online Help for { $ph1 }}.}

It should be equivalent to your proposed solution, but (IMHO) cleaner + follows the formatting guide of mf2 (hopefully).

janishorsts commented 8 months ago
<!-- latest angular -->

<span i18n>The author is {gender, select, male {male} female {female} other {other}}</span>

It produces the following XLIFF1.2

<trans-unit id="3560311772637911677" datatype="html">
  <source>The author is <x id="ICU" equiv-text="{gender, select, male {male} female {female} other {other}}" xid="7670372064920373295"/></source>
  <context-group purpose="location">
    <context context-type="sourcefile">src/app/app.component.html</context>
    <context context-type="linenumber">339,341</context>
  </context-group>
</trans-unit>
<trans-unit id="7670372064920373295" datatype="html">
  <source>{VAR_SELECT, select, male {male} female {female} other {other}}</source>
  <context-group purpose="location">
    <context context-type="sourcefile">src/app/app.component.html</context>
    <context context-type="linenumber">339,340</context>
  </context-group>
</trans-unit>

The MF2 transformation challenges: