globalizejs / globalize

A JavaScript library for internationalization and localization that leverages the official Unicode CLDR JSON data
https://globalizejs.com
MIT License
4.8k stars 605 forks source link

MessageFormat formatters #563

Open nkovacs opened 8 years ago

nkovacs commented 8 years ago

The messageformat library supports custom formatter functions. You could register globalize's formatters, so they could be used in messages, e.g. "Balance: {0, currency}", or "Posted at {0, datetime, long}".

It would also be great if there was a way to pass custom formatter functions to MessageFormat.

rxaviers commented 8 years ago

Thanks for your message and sorry for the delayed answer.

Please, use variable replacement instead, e.g.: "Balance: {currency}" or "Posted at {date}" and have the variable formatted using the appropriate formatter in your code, e.g.:

Globalize.formatMessage("message", {date: Globalize.formatDate(new Date())});

If you find any problem using variable replacement instead or if you have further questions feel free to post additional comments.

PS:

The messageformat library supports custom formatter functions

... and I was one of the early pushers for such API to be adopted by SlexAxton/messageformat.js (the libraries Globalize uses for mesage formar under the hoods) (link). :smile: (and Alex an Eemeli did a great work updating the library). Having said that, given variable replacement could be used instead with no prejudice in that case, we opt for that.

If you want to update globalize message format to support such feature, feel free to contribute the change and I'd be happy to consider it: (a) send informal messages first to discuss the new API, then send a pull request with the implementation.

nkovacs commented 8 years ago

The problem with using a formatter in the variable is that it doesn't allow you to change the format in the message file. It's hard-coded. This would not only allow the format to be customized for each language, it would also allow changing it without touching the code. E.g. if you have an interface where an admin can change the message files used by your app, this change would allow an admin to customize the format used in a message.

nkovacs commented 8 years ago

I made a quick proof of concept. The issues with it are:

Globalize.b955419430 = messageFormatterFn((function(  ) {
  return function (d) { return "Hello World " + fmt.date(d.now, ["en"], "long"); }
})()

Ideally I'd like the compiler to automatically detect the call to fmt.date, and compile the dateformatter as well, but I don't know if that's possible with the current version of MessageFormat. This is the test file I used: https://gist.github.com/nkovacs/d6e429f7a5e0871ceb392e739031c100

rxaviers commented 8 years ago

As an earlier step, could you please show me a map between each Globalize formatters option and its inlined message format representation? For example, above you mentioned long, is long the value for date, time, or datetime? Also, how to pass a skeleton?

Ideally I'd like the compiler to automatically detect the call to fmt.date, and compile the dateformatter...

Yeap the compiler could statically parse the message and do that (i.e., reuse the message formatter parser to deduce the formatters).

nkovacs commented 8 years ago

The message was {now, date, long} in the example, so it becomes {date: 'long'}. {now, time, long} would be {time: 'long'}, and {now, datetime, long} would be {datetime: 'long'}.

This is similar to what ICU does (http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html), except ICU only has date and time.

ICU also accepts a raw format (if the parameter is not one of the short formats), so {now, date, yyyy-MM-dd} would become {raw: 'yyyy-MM-dd'} if you wanted to emulate that.

Skeleton could be implemented in a few different ways:

The basic mapping could look like this:

But one reason I'd like to be able to customize the formatters is that I wanted to integrate globalize with Yii, and I could write custom formatters that work the same way ICU in php does (http://www.yiiframework.com/doc-2.0/guide-tutorial-i18n.html#message-formatting). That way I could use the same messages in php and in javascript, which is kind of a pain right now (I have to render everything in php to get pluralization and such things, and then return the html in the ajax response).

Yeap the compiler could statically parse the message and do that (i.e., reuse the message formatter parser to deduce the formatters).

Yeah, but for globalize I think it would be better if the custom formatter function received the Globalize instance, the same one used to render the message, so you don't have to instantiate another one and compile a new formatter each time the message is rendered (if you're not using pre-compiled files). But that requires modifying messageformat or using messageformat-parser directly. The runtime binding code is already a bit hacky, it could also be cleaned up.

rxaviers commented 8 years ago

I liked it so far. /cc @jzaefferer and @alunny for their inputs.

nkovacs commented 8 years ago

I've hacked messageformat so that the custom formatter function can use the same globalize instance, and the dependent formatters can be passed to globalize-compiler: https://github.com/nkovacs/globalize/commit/21cb3b3923fbac0937bc5ab0d626d6e107a6fb30#diff-731a3fca6b201d79e2639fe1456b8787L156

I'll try to clean it up and get it into messageformat.js, but for now I just wanted to show how it could be done in globalize.

ccschneidr commented 8 years ago

This sounds great. Any news on it?

nkovacs commented 7 years ago

Compilation now works. The only thing that needs to be changed in globalize-compiler is the compilation order.

Messageformat.js has since released a major new version, so I'll have to update that too.

nkovacs commented 7 years ago

Messageformat.js 1.0 has changed so much that the hacks used to integrate it into globalize no longer work. In particular, since the runtime is no longer static, I was unable to extract it and inject it into globalize's message-runtime module.

So instead I copied the messageformat compiler and runtime into globalize.js, and used messageformat-parser (which has since been extracted into a separate npm package). Since I now had direct access to the compiler, I was also able to remove the regexp hacks in messageFormatterRuntimeBind (the compiler can tell the runtime binding function what features are needed, e.g. plurals, select etc.).

Here's the commit: https://github.com/nkovacs/globalize/commit/1586e12ff1b7f24a649e442899e76575f6c19b2d

What do you think?

rxaviers commented 7 years ago

These presets look nice:

{now, date, long} -> dateFormatter({date: 'long'}) {now, time, long} -> dateFormatter({time: 'long'}) {now, datetime, long} -> dateFormatter({datetime: 'long'})

Any of the below look nice to me too, except for the fact that adding a time pattern in the skeleton below will result in a datetime output, which sounds inconsistent with date since we have message formatters named time or datetime. Do you see what I mean? I have no suggestion at the moment though.

{now, date, skeleton, GyMMMd} {now, dateskeleton, GyMMMd}

rxaviers commented 7 years ago

Messageformat.js 1.0 has changed so much that the hacks used to integrate it into globalize no longer work. In particular, since the runtime is no longer static, I was unable to extract it and inject it into globalize's message-runtime module.

So instead I copied the messageformat compiler and runtime into globalize.js, and used messageformat-parser (which has since been extracted into a separate npm package). Since I now had direct access to the compiler, I was also able to remove the regexp hacks in messageFormatterRuntimeBind (the compiler can tell the runtime binding function what features are needed, e.g. plurals, select etc.).

The existing "live-patch" isn't great, but I want to avoid copying dependencies and modifying them, because this is even harder to maintain over time. We need a better approach... Are there any changes we could propose in their code that would make it easier in our side? Is there any sort of JavaScript patch we could use instead of the bunch of replaces in Gruntfile?

rxaviers commented 7 years ago

About plural requiring cardinal + ordinal data... I want to avoid that. I'm wondering if {plural, ... could use a cardinal formatter, and {selectordinal, ... could use a ordinal formatter?

nkovacs commented 7 years ago

Any of the below look nice to me too, except for the fact that adding a time pattern in the skeleton below will result in a datetime output, which sounds inconsistent with date since we have message formatters named time or datetime.

I didn't implement the skeleton and raw options yet because I'm not sure how to do that. The rest are done: https://github.com/jquery/globalize/commit/4c95d9499efa4add7e0ed80fb8a531f62de754de.

About plural requiring cardinal + ordinal data... I want to avoid that. I'm wondering if {plural, ... could use a cardinal formatter, and {selectordinal, ... could use a ordinal formatter?

With the custom compiler, yes. I'm not sure if it's doable if using messageformat.js directly, in the current version of globalize. It probably is, but it won't be pretty. That ties into your next question.

The existing "live-patch" isn't great, but I want to avoid copying dependencies and modifying them, because this is even harder to maintain over time. We need a better approach... Are there any changes we could propose in their code that would make it easier in our side? Is there any sort of JavaScript patch we could use instead of the bunch of replaces in Gruntfile?

I've used messageformat-parser from npm (it's not available in bower), so I only had to copy and rewrite the compiler and the runtime, which is relatively small, and that allowed me to customize it to globalize's needs. For example, the new messageFormatterRuntimeBind is much better. I think this is a better approach than heavily patching messageformat.js in the Gruntfile. It might be possible to use messageformat.js from npm and use only the compiler (compiler.js), but that's internal, so you'd again be left with something that can change and break at any time, plus some monkey-patching would still be needed. The changes required to messageformat.js to make it usable in globalize would be extensive. They'd have to make it possible to customize the compiler. I doubt they'd want to add that complexity to messageformat.js, when you can just use messageformat-parser and write your own simple compiler.

rxaviers commented 7 years ago

I only had to copy and rewrite the compiler and the runtime

Could you please show me a diff?

nkovacs commented 7 years ago

and

rxaviers commented 7 years ago

Yeap, but looking at a diff from the original compiler and runtime to their rewritten ones would be easier to see what the changes are. Don't worry if you don't a diff handy, I can generate one...

Basically, I'm in line with your suggestion of using a newer messageformat. Although, I want to better understand the changes and impact.

nkovacs commented 7 years ago

It's a bit hard to see it here because of the whitespace changes required by the coding standard:

compiler.js: https://gist.github.com/nkovacs/8dea134c8af7345c1c7ed921e9dc7aad/revisions

runtime.js: https://gist.github.com/nkovacs/11f320e6ae60b1dccf943768367dab4d/revisions

The first revision is messageformat.js's version indented with 4 spaces (original is 2 spaces), second revision is my version.

rxaviers commented 7 years ago

I used your gists and created this diff that ignores white space changes:

rxaviers commented 7 years ago

@nkovacs how to you suggest we maintain these files? For instance, let's suppose messageformat publish new releases with updates to those files and we want to bring those updates in.

rxaviers commented 7 years ago

Another question is, what are the challenges and cost of using the new messageformat as is? From your above comments, one of them is "They'd have to make it possible to customize the compiler", what customization would be required please?

nkovacs commented 7 years ago

The problems I ran into trying to use messageformat 1.0.2:

The problems with using messageformat in general (this applies to 0.3.0 as well):

The problem is that messageformat.js compiles {now, date, short} to something like fmt.date(d.now, 'short'), but globalize.js needs the 'short' parameter at compile time to be able to compile an appropriate formatter function.

The minimum change required in messageformat.js would be to return the compiler's runtime property and add the arguments to its formatters object. Globalize's compiler could then generate the appropriate wrappers and an fmt object with a special wrapper fmt.date function, and bind the compiled dateformatter as a dependency.

My version does it slightly differently: {now, date, short} is compiled to fmt[0](d.now), and the wrapper function receives an fmt array, where the 0th element is Globalize.dateFormatter({date: 'short'}).

Here's a complete compiled example:

Globalize.b955419430 = messageFormatterFn((function(plural, fmt, en) {
    return function(d) {
        return "Hello World number( " + plural(d.x, 0, en, {
            one: "one task " + fmt[0](d.now) + " ",
            other: d.x + " tasks " + fmt[1](d.now) + " "
        });
    }
})(messageFormat.plural, [Globalize("en").dateFormatter({
    "date": "long"
}), Globalize("en").dateFormatter({
    "date": "short"
})], Globalize("en").pluralGenerator({
    type: "both"
})), Globalize("en").pluralGenerator({
    "type": "both"
}), Globalize("en").dateFormatter({
    "date": "long"
}), Globalize("en").dateFormatter({
    "date": "short"
}));

and the original message was:

'Hello World number( {x, plural, one {one task {now, date, long} } other {{x} tasks {now, date, short} } }'

I'm not sure why the extra parameters are passed to messageFormatterFn, but I think that's already happening with the current version of globalize.js and pluralGenerator.

jrsearles commented 5 years ago

Any update on this? I am running into this as well. The messages for me are potentially user defined so formatting the value passed in isn't an option.