Support ICU MessageFormat

rxaviers commented 10 years ago

PR: #321

Globalize.formatMessage() should support Message Format as in ICU.

Globalize.loadMessages({
  "en": {
    like: "{count, plural, offset:1" +
      "   =0 {Be the first to like this}" +
      "   =1 {You liked this}" +
      "  one {You and someone else liked this}" +
      "other {You and # others liked this}" +
      "}",
    task: "You have {count, plural," +
      "  one {one task}" +
      "other {# tasks}" +
      "} remaining"
  }
});

var en = new Globalize( "en" );

en.formatMessage( "task", { count: 0 } ); // You have 0 tasks.
en.formatMessage( "task", { count: 1 } ); // You have 1 task.

var likeFormatter = en.messageFormatter( "like" );

[ 0, 1, 2, 3 ].map(function( count ) {
  return taskFormatter({ count: count });
});
// [
//   "Be the first to like this",
//   "You liked this",
//   "You and someone else liked this",
//   "You and 2 others liked this"
// ]

For background purposes...

The original issue has been opened as "Basic .formatMessage()". The goal was to discuss whether or not we should implement a basic .formatMessage() that would reside in globalize.js (core). For a MessageFormat extension, one would load globalize/message.js module.

The discussion here has diverged into Message Format specifics... Considering this discussion has great value and the basic formatMessage has been implemented internally due to #251, I have renamed this issue to "Support MessageFormat".

Follow the original issue description:

We currently have two "message" functions: .loadTranslations( translationData ) and .translate( path ).

While they're useful for translations, it doesn't provide any value for formatting messages, eg. variable replacements, plural or select formatting.

:smile:

Globalize( "en" ).translate( "bye" ); // bye
Globalize( "pt" ).translate( "bye" ); // tchau
// Obviously, considering the appropriate translate values have been loaded.

:sob:

Globalize.???( "{0} seconds", 4 ); // 4 seconds
Globalize.???( "{0, plural, one {{0} second} other {{0} seconds}}", 4 ); // 4 seconds
Globalize.???( "{0, select, male {He} female {She}} waited for {1, plural, one {{1} second} other {{1} seconds}}", 4 ); // He waited for 4 seconds

Plural and select formatting requires a parser. But, the variable replacement of the 4 seconds example is easily implementable and is needed in several parts of CLDR.

Given the easiness and necessity of variable replacement and all the above context, I suggest we implement:

.formatMessage( message, data );
.formatMessage( "{0} second", 1 ); // 1 second
.formatMessage( "{0}/{1}", ["m", "s"] ); // m/s
.formatMessage( "{name} <{email}>", {
  name: "Foo",
  email: "bar@baz.qux"
}); // Foo <bar@baz.qux>

The above implementation plus the translation functions have a minified size of ~0.7Kb. Given its size, I want to include all the three functions (loadTranslations(), translate(), and formatMessage() the simple version) in the Core module.

If user needs to format more complex messages, eg. plural or select formatting, he can load plural or other modules and extend the simple .formatMessage() above.

scottgonzalez commented 10 years ago

I'd like to hear from @SlexAxton on this.

SlexAxton commented 10 years ago

Hi, thanks for pinging.

Naturally, my favorite thing for this is my library https://github.com/SlexAxton/messageformat.js. There are lots of reasons why you'd pick a message formatting library with support for gender and pluralization when building a globalization library, but I spose before I go too deeply into those reasons, I'm wondering if you guys would be avoiding it for some reason.

Messageformat, fwiw, compiles to nearly standalone functions with a few helpers for redundancy. So the size is usually extremely minimal (most of the time, less than the 0.7kb above, for a normal amount of messages.) You do need to switch out the code for the runtime though.

rxaviers commented 10 years ago

Hi @SlexAxton,

Depending on your library to satisfy the message format functionalities we need is our preferred option over re-implementing it in here.

One of the main goals of the Globalize rewrite was to stop embedding i18n content into the library. We've achieved this goal by having completely separated Globalize code (the engine) from CLDR content (the fuel / the rules). We let user-code to feed Globalize on the appropriate portions of CLDR prior to using it (example on node.js).

I see your library has embedded content. Although, I think it's only used for plural logic. Is that correct? Note that we handle plural in Globalize (with the recent collaboration from wikipedia santhoshtr/CLDRPluralRuleParser). So, if that's the only constrain of your library on that regard, it's not a problem for us (obviously, if your library could delegate that part).

In a very high perspective, I think our Globalize.formatMessage() could behave just like the MessageFormat parser (by completely using your library for that), taking care of understanding the formatted string and calling (delegating) the appropriate sub-procedures (eg. plural format, select format, number format, etc) to format each part of the whole message. Then, returning the formatted message back.

I would very much like to hear what you think.

SlexAxton commented 10 years ago

We can definitely generate the pluralization functions on the fly. I just provide those for ease-of-use. Technically, it will work as is. The second argument to the MessageFormat constructor is a pluralization function. So if you don't have fr loaded in from my provided files, you can just instantiate with:

new MessageFormat('fr', function(n){ return n == 0 ? 'zero' : n == 1 ? 'one' : 'other'; });

I don't know if you guys would handle that part on your own or just ask the user to do it, but regardless, I think everything is there you'd need.

In a very high perspective, I think our Globalize.formatMessage() could behave just like the MessageFormat parser (by completely using your library for that), taking care of understanding the formatted string and calling (delegating) the appropriate sub-procedures (eg. plural format, select format, number format, etc) to format each part of the whole message. Then, returning the formatted message back.

I'm not exactly sure if I'm understanding you here. MessageFormat is already the ICU standard that contains both PluralFormat and SelectFormat, so you can have both within one message. NumberFormat, however, is implemented separately, and the official advice from the ICU is that you pre-format numbers before putting them into strings.

I don't support some of the super-new things like ordinals (1st, 2nd), or 'and' placement stuffs (1, 2, and 3) either. Unsure if that's something you're looking for.

I think the important thing is to do something standard here. In my opinion, the ICU does the best job here, and automatically works with all the CLDR, so it's my favorite.

rxaviers commented 10 years ago

Ok, first things first.

I think the important thing is to do something standard here. In my opinion, the ICU does the best job here, and automatically works with all the CLDR, so it's my favorite.

The new Globalize rewrite strictly follows CLDR/LDML specs (or, better put, Unicode TR#35). But, it doesn't stick to one or another interface (API), eg. ICU or ECMA-402.

Having said that, I do have a similar opinion as you do about ICU's MessageFormat. I think it rocks and I am personally favorable of adopting it in Globalize. (more specifically, in Globalize.formatMessage()).

Now, into the details...

I understand your MessageFormat implementation delegates plural to whatever function is passed in as the second parameter of the constructor (which I guess you internally map/cache to the string passed as locale in the first parameter). Therefore, Globalize is technically able to use your code underneath .formatMessage() as is. This is wonderful.

Let's discuss further on IRC or in tomorrow UI's meeting. So, I can answer to any of your questions and ask you some. I started by typing them all here, but it seems unproductive.

globalizejs / globalize

Support ICU MessageFormat #265