formatjs / formatjs-old

The monorepo home to all of the FormatJS related libraries.
https://formatjs.io/
156 stars 53 forks source link

Document limitations to XML formatting #152

Closed victorandree closed 5 years ago

victorandree commented 5 years ago

Which package? intl-messageformat

Is your feature request related to a problem? Please describe. I think the documentation for XML formatting could be improved with regards to purpose, scope and limitations. Concretely:

Describe the solution you'd like Given the current state of the library and documentation, I think it should be clarified that:

  1. XML tags do not support nesting
  2. XML tags do not support attributes

Describe alternatives you've considered Given these limitations, the usefulness of XML syntax seems limited.

Aren't they effectively a way of supporting custom argument constructions? Just as we can plural format, some apps might want to emphasize or link format their translations: Share your {emphasize, "photos"} with friends. {link, "https://example.com/en/learn-more", "Learn more"} (I guess this is related to #20).

Not sure I'm reading the ICU "Formatting messages" page correctly, but this would seem to be something like supporting "Custom Format Objects (discouraged)"?

longlho commented 5 years ago

We can definitely use some help there so PR’s welcome!

On Mon, Aug 12, 2019 at 7:53 AM Victor Andrée notifications@github.com wrote:

Which package? intl-messageformat

Is your feature request related to a problem? Please describe. I think the documentation for XML formatting could be improved with regards to purpose, scope and limitations. Concretely:

-

XML formatting does not support nested tags, since a string of .textContent is passed to the "value" function (so nested tags are stripped).

Any attributes on the XML tag are also swallowed. I think there's an obvious use case for e.g. translating URLs as part of a link. Consider a message inviting someone to learn more where the target URL is language-dependent: Share your photos with friends. Learn more.

A concrete workaround is of course to use some kind of formatting (e.g. Markdown) for the text content in the message (Share your photos with friends. Learn more and just parse it in your "value" function.

Describe the solution you'd like Given the current state of the library and documentation, I think it should be clarified that:

  1. XML tags do not support nesting
  2. XML tags do not support attributes

Describe alternatives you've considered Given these limitations, the usefulness of XML syntax seems limited.

Aren't they effectively a way of supporting custom argument constructions? Just as we can plural format, some apps might want to emphasize or link format their translations: Share your {emphasize, "photos"} with friends. {link, "https://example.com/en/learn-more", "Learn more"} (I guess this is related to #20 https://github.com/formatjs/formatjs/issues/20).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/formatjs/formatjs/issues/152?email_source=notifications&email_token=AABQM33FEPQHRB2FNPHDV5DQEFFMXA5CNFSM4ILAONUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HEWGABQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AABQM34RANVWLPNWENGM7VTQEFFMXANCNFSM4ILAONUA .

longlho commented 5 years ago

I think right now the main use case of XML tag support is for React. Considering Share your photos with friends. <LearnMore href="https://example.com/en/learn-more">Learn more</LearnMore> the translator wouldn't know what to do w/ the URL so it'd just be echoed back as is. I'm not sure I understand your use case?

It is true that custom formatters are discouraged bc it's not really i18n-safe & translation vendor might not know what to do with it. In the example Share your {emphasize, "photos"} with friends. {link, "https://example.com/en/learn-more", "Learn more"} which things are supposed to be translated? From what I've seen ICU is so sophisticated as is for translation vendors that some of them don't even support things like multi plurals and such.

victorandree commented 5 years ago

I think right now the main use case of XML tag support is for React. Considering Share your photos with friends. <LearnMore href="https://example.com/en/learn-more">Learn more</LearnMore> the translator wouldn't know what to do w/ the URL so it'd just be echoed back as is. I'm not sure I understand your use case?

The URL use case might be a bit contrived, but consider translating other attributes, like title. I'm not sure what the best practice would be for translating a component using a lot of text level semantics. For another somewhat contrived example:

<p>
  Share <abbr title="Pictures you make with a camera">photos</abbr> with your
  <abbr title="People you know">friends</abbr>.
  <a href="https://example.com/en/learn-more" target="_blank">Learn more</a>
</p>

If props were simply copied, allowing a translator to work with the entire inner HTML of the p element seems most straightforward. But without props copying, we have to split it up into a number of messages, which seems awkward (see example below).

Blindly allowing translators to insert attributes obviously isn't desirable, so I can understand that you would avoid this completely.

I've attached some examples at the bottom of this comment for how I would approach this.

It is true that custom formatters are discouraged bc it's not really i18n-safe & translation vendor might not know what to do with it. In the example Share your {emphasize, "photos"} with friends. {link, "https://example.com/en/learn-more", "Learn more"} which things are supposed to be translated? From what I've seen ICU is so sophisticated as is for translation vendors that some of them don't even support things like multi plurals and such.

Isn't adding XML making the syntax even more sophisticated, though? The existing ICU syntax already supports relative complex parameters. ICU syntax could be extended without introducing another markup language I think.

# This order of arguments is closest to how an argument is looked up in
# `values` to map `<em>photos</em>` to `{ em: msg => <em>{msg}</em>}` in
# `react-intl`. is it really any less opaque than or familiar than XML syntax?
# When should a translator use markup vs ICU arguments?
Share your {em, "photos"} with friends. {a, "Learn more", href:"https://example.com/en/learn-more"}

# Tags determined by the 2nd argument: The second argument is like a function.
Share your {"photos", em} with friends. {"Learn more", a, href:"https://example.com/en/learn-more"}

# A generic "tag" argument might be better from a namespacing/extensibility
# perspective
Share your {"photos", tag, em} with friends. {"Learn more", tag, a, href:"https://example.com/en/learn-more"}

Here's some examples of different approaches. I hope they can illustrate why props on XML tags could be useful -- and why it isn't obvious that they don't work :)

A couple of `react-intl` examples ```jsx import React from 'react'; import { FormattedMessage, useIntl } from 'react-intl'; /** * This would support getting the attributes of XML tags so translators can * see a fuller context. The attributes are available to the value functions. */ export const SharePhotosXmlTag = () => (

photos with your friends. Learn more `} values={{ learnMore: (msg, { title }) => ( {msg} ), abbr: (msg, { title }) => {msg}, }} />

); /** * Works without XML tags and just basic ICU syntax. Worse for translators * who lose the context of "photos" and "friends. */ export const SharePhotosWithReactComponents = () => { const { formatMessage } = useIntl(); const photos = ( ); const friends = ( ); const learnMore = ( ); return (

); }; /** * XML tags keep the context for the text content, which is the content * most likely to be affected by surrounding text. This is what the XML feature * supports now. Arguably awkward for the programmer and translator... */ export const SharePhotosWithoutProps = () => { const { formatMessage } = useIntl(); return (

photos with your friends. Learn more `} values={{ AbbrPhotos: msg => ( {msg} ), AbbrFriends: msg => ( {msg} ), LearnMore: msg => ( {msg} ), }} />

); }; /** * Using an extended ICU syntax instead of XML. Key-value arguments are passed * as `props` to the format function. More...consistent? Maybe? */ export const SharePhotosIcuSyntax = () => (

( {msg} ), abbr: (msg, { title }) => {msg}, }} />

); ```

* In my example, the URL is supposed to be translated. I realize there's likely better options than that in most cases.

longlho commented 5 years ago

Ah thanks for the explanation. Let me do some research on how others are doing it. Right now there're existing issues with translated content affecting context within the message itself that are still unsolved so this is a tough issue.

longlho commented 5 years ago

So in my discussion at ECMA402, custom format is not recommended by ICU and even skeleton (which is very complex) is preferable over predefined value. I think there's no perfect solution to this. Even with custom format, ICU explicitly says you can't specify custom format inside nested rule (like plural or select) which makes it less useful.

Only the top-level arguments are accessible and settable via setFormat(), getFormat() etc. Arguments inside nested sub-messages, inside choice/plural/select arguments, are "invisible" via these API methods.

The introduction of embedded XML is basically only meant as a contextual placeholder and not meant to be anything complex. https://projectfluent.org/ has a similar concept called overlay as well. In reality from the translation vendors I've worked with, they're trained to ignore XML tags.

Also as I've mentioned, PR to improve doc is very welcome!