globalizejs / globalize

A JavaScript library for internationalization and localization that leverages the official Unicode CLDR JSON data
https://globalizejs.com
MIT License
4.8k stars 603 forks source link

MessageFormat formatters #653

Closed nkovacs closed 4 years ago

nkovacs commented 7 years ago

Fixes #563

nkovacs commented 7 years ago

About plural requiring cardinal + ordinal data... I want to avoid that. I'm wondering if {plural, ... could use a cardinal formatter, and {selectordinal, ... could use a ordinal formatter?

I've implemented this. It also doesn't output the pluralGenerator at all if it's not needed (the current version of globalize always outputs the cardinal one, even if no message needs it). It's not perfect, because if you have a message that contains plural, a message that contains selectOrdinal, and a message that contains both, the compiled output will have all three pluralGenerators. I can modify the "both" case to use two separate plural functions, but I'll have to change the compiler's output.

nkovacs commented 7 years ago

I've changed the way messageformatters are compiled. Instead of this:

Globalize.b955419430 = messageFormatterFn((function(plural, fmt, en) {
    return function(d) {
        return "Hello World number( " + plural(d.x, 0, en, {
            one: "one task " + fmt[0](d.now) + " ",
            other: d.x + " tasks " + fmt[1](d.now) + " "
        });
    }
})(messageFormat.plural, [Globalize("en").dateFormatter({
    "date": "long"
}), Globalize("en").dateFormatter({
    "date": "long"
})], Globalize("en").pluralGenerator({
    type: "cardinal"
})), Globalize("en").pluralGenerator({
    "type": "cardinal"
}), Globalize("en").dateFormatter({
    "date": "long"
}), Globalize("en").dateFormatter({
    "date": "long"
}));

It now looks like this:

Globalize.b955419430 = messageFormatterFn((function(en) {
    var plural = messageFormat.plural;
    var fmt = [].slice.call(arguments, 1);
    return function(d) {
        return "Hello World number( " + plural(d.x, 0, en, {
            one: "one task " + fmt[0](d.now) + " ",
            other: d.x + " tasks " + fmt[1](d.now) + " "
        });
    }
}), "call", Globalize("en").pluralGenerator({
    "type": "cardinal"
}), Globalize("en").dateFormatter({
    "date": "long"
}), Globalize("en").dateFormatter({
    "date": "long"
}));

The formatters are not duplicated by messageFormatterRuntimeBind.

This required changing messageFormatterFn, so to make globalize-runtime compatible with older compiled bundles, I added a "call" string argument after the function. Old bundles will either have the pluralGenerator as the second argument or no second argument, so messageFormatterFn will use the old behavior.

nkovacs commented 7 years ago

I've cleaned this up, and it's mostly done.

The remaining issues:

rxaviers commented 7 years ago

Ok, it's on my TODO list to review.

Strate commented 7 years ago

Any news here?

rxaviers commented 7 years ago

This needs review. Basically, we need to make sure this change is backward compatible, and it works with globalize-compiler. I didn't find time to do it yet, but if someone could investigate that and post findings here, it would speed things up :) thanks

nkovacs commented 7 years ago

I also have to document and add tests. I'll try to do that soon. I also want to add datetime skeletons, because they're much more useful now (thanks to dynamic augmentation), but we need to agree on a syntax for that (see https://github.com/globalizejs/globalize/issues/563#issuecomment-218460227).

rxaviers commented 7 years ago

Awesome, thanks @nkovacs

I also want to add datetime skeletons, because they're much more useful now (thanks to dynamic augmentation), but we need to agree on a syntax for that.

They are!! Thanks to @Strate for that too :)

About the syntax, it would be good to (re-)evaluate existing solutions and digest a conclusion. Useful links:

Strate commented 7 years ago

@nkovacs are you use it now in some projects? I would megre you PR in current state and test it.

nkovacs commented 7 years ago

Not yet, but please do test it. I've rebased it on master recently, but haven't had time to test it thoroughly.

Strate commented 7 years ago

@nkovacs at least {variable, number} works great. Thank you!

nkovacs commented 7 years ago

I've added tests, but they fail because they need https://github.com/globalizejs/globalize-compiler/pull/17

I ran into some issues with messageformat-parse: https://github.com/messageformat/parser/issues/1 and https://github.com/messageformat/messageformat.js/issues/175

The second one I've already started fixing. The first one I can also fix, but that would mean that this would no longer be parsed as two separate parameters (["skeleton","yMd"]) by messageformat-parser: {var,date,skeleton,yMd}

The way ICU works is that after the second comma, everything until the } is part of argStyle, and that's the only parameter that's accepted (but you can escape } with '}') However, it would still be possible for the formatter function to split on , so the above would work. The alternative is {var,dateskeleton,yMd}.

nkovacs commented 7 years ago

I've implemented date raw and skeleton support.

Raw works the same way ICU does: date raw comma: {x, date, y-M-d, HH:mm:ss zzzz } becomes date raw comma: 2010-9-15, 17:35:07 GMT+02:00. Note that the comma doesn't split the parameter and whitespace is not trimmed.

Skeleton is {x, date, skeleton, GyMMMEdhms}, but I can change it to {x, dateskeleton, GyMMMEdhms}.

This requires https://github.com/messageformat/parser/pull/3.

rxaviers commented 7 years ago

Can you also update documentation please? Somewhere we need to list this new API.

{x, date, y-M-d, HH:mm:ss zzzz }

I'd rather use skeleton by default (which is the recommended way users should customize their formats) and have a specific dateraw for the raw pattern. How can they pass presets like {date: small} please?

How does this implementation compare to https://github.com/messageformat/messageformat.js and https://github.com/yahoo/intl-messageformat/ please?

This requires messageformat/parser#3.

Please, is the update included in this PR? Or it requires any further action?

nkovacs commented 7 years ago

Can you also update documentation please? Somewhere we need to list this new API.

Of course. Should I add it to doc/api/message/message-formatter.md?

I'd rather use skeleton by default (which is the recommended way users should customize their formats) and have a specific dateraw for the raw pattern. How can they pass presets like {date: small} please?

I did it this way to keep it compatible with ICU. In ICU you have:

Both {x,date, y-M-d, HH:mm:ss zzzz } and {x,time, y-M-d, HH:mm:ss zzzz } return " 2017-7-3, 15:20:34 Central European Summer Time " (with spaces).

This PR supports that, plus:

I did it this way to make it compatible with ICU. However, since it's possible to overwrite the formatter function with Globalize.addMessageFormatterFunction, I'm not against making the default different, and I agree that skeleton should be preferred. If someone needs it to be compatible (e.g. using the same messages on frontend and backend), they can overwrite the defaults.

Please, is the update included in this PR? Or it requires any further action?

No, it's not, since messageformat-parser is downloaded from npm. I made a forked version, I could use that: https://www.npmjs.com/package/icu-messageformat-parser.

rxaviers commented 7 years ago

Of course. Should I add it to doc/api/message/message-formatter.md?

It looks good. Thanks.

I did it this way to make it compatible with ICU. However, since it's possible to overwrite the formatter function with Globalize.addMessageFormatterFunction, I'm not against making the default different, and I agree that skeleton should be preferred. If someone needs it to be compatible (e.g. using the same messages on frontend and backend), they can overwrite the defaults.

I understand, it's always good to follow a standard, but it's unfortunate ICU is doing it that way. Does ICU allow skeleton input? If you have time, please could you also compare to what Yahoo is doing (link above)?

No, it's not, since messageformat-parser is downloaded from npm. I made a forked version, I could use that: https://www.npmjs.com/package/icu-messageformat-parser.

Would this be needed in addition to messageformat/messageformat.js or would it replace it please?

nkovacs commented 7 years ago

I understand, it's always good to follow a standard, but it's unfortunate ICU is doing it that way. Does ICU allow skeleton input? If you have time, please could you also compare to what Yahoo is doing (link above)?

No, ICU doesn't allow skeleton input, unfortunately. I tried Yahoo's intl-messageformat-parser, and I found that it had the same bugs as messageformat-parser (which I fixed in my fork).

Would this be needed in addition to messageformat/messageformat.js or would it replace it please?

Messageformat.js isn't needed, the parts that were needed have been replaced by message/compiler.js and message/formatter-runtime.js (see https://github.com/globalizejs/globalize/issues/563#issuecomment-257873830)

rxaviers commented 7 years ago

Ok. Thanks.

and I found that it had the same bugs as messageformat-parser (which I fixed in my fork).

Did you send a PR to the origin repo? Please, if so could you provide a link for reference?

Messageformat.js isn't needed, the parts that were needed have been replaced by message/compiler.js and message/formatter-runtime.js (see #563 (comment))

Ok, that is well explained. Thanks! I see that you had reasons to drop messageformat/messageformat.js and to use a custom replacement. I also understand that the way globalize currently embeds messageformat/messageformat.js is hard to maintain over time. Having said that, there's a downside of duplicating what messageformat/messageformat.js does. We wouldn't take benefit of bug fixes and improvements done there...

nkovacs commented 7 years ago

Did you send a PR to the origin repo? Please, if so could you provide a link for reference?

I mean I fixed them in messageformat-parser, not intl-messageformat-parser (https://github.com/messageformat/parser/pull/3). intl-messageformat-parser is based on messageformat.js's parser (before it was a standalone package), but it's quite different.

E.g. intl-messageformat-parser/intl-messageformat handles # completely differently, it's not parsed by the parser at all, instead the compiler handles it: https://github.com/yahoo/intl-messageformat/blob/master/src/compiler.js#L59 But it doesn't correctly handle '#'.

I also looked at https://www.npmjs.com/package/format-message-parse, it was slightly better with apostrophe escaping, but it also can't parse {x,date, y-M-d, HH:mm:ss zzzz }.

Then there's messageformat.js's strictNumberParsing, which is not ICU compatible, but is off by default, and turning it on would mean breaking backwards compatibility with previous versions of globalize.js.

eemeli commented 7 years ago

Noting also here that I've reviewed the messageformat-parser PR. Overall, it's good, but ought to be split into two as the quote-escaping will change users' outputs, and therefore requires us to make a major version update.

rxaviers commented 4 years ago

I am closing all old (and stalled) PRs