globalizejs / globalize

A JavaScript library for internationalization and localization that leverages the official Unicode CLDR JSON data
https://globalizejs.com
MIT License
4.8k stars 603 forks source link

Mismatch between Globalize and other libraries when dealing with missing skeletons as per TR-35 with dealing with combining date and time patterns #771

Open cahuja opened 7 years ago

cahuja commented 7 years ago

Here is another potential spec bug with missing skeletons in Globalize 1.3 which is similar to #764. See the code sample below:

//Setup
const Globalize = require('globalize');
Globalize.load( require( "cldr-data" ).entireSupplemental() );
Globalize.load( require( "cldr-data" ).entireMainFor('it') );
Globalize.locale('it');

const formatterOptions = {
  skeleton: 'yMMMMdjmm',
};

// Create a localized formatter.
const f = Globalize.dateFormatter(formatterOptions);

// Format a localized date
const localDate = f(new Date(2017,00,01));

// Setup en-US data
Globalize.load( require( "cldr-data" ).entireMainFor('en') );
Globalize.locale('en-US');

// Create an english formatter
const f2 = Globalize.dateFormatter(formatterOptions);
const englishDate = f2(new Date(2017,00,01));

console.log(localDate);
// 1 gennaio 2017 00:00

console.log(englishDate);
// January 1, 2017 at 12:00 AM

Per my read of the TR-35 spec on matching skeletons and what the CLDR data shows, Globalize is doing the right thing. The width of the months is long and there is no weekdays in the above skeleton. However, the output does not match that of iOS and ICU4j which is -

1 gennaio 2017, 00:00

I can do a workaround in a manner similar to what was done for #764 but I wanted to highlight this as another potential cause for trouble.

rxaviers commented 7 years ago

Thanks Chetan!

cahuja commented 7 years ago

A similar issue happens for pt-PT as well. The spec (TR-35) directs us to use the dateFormatLength full when using the pattern "yMMMMdjm"

Combine the patterns for the two dateFormatItems using the appropriate dateTimeFormat pattern, determined as follows from the requested date fields: If the requested date fields include wide month (MMMM, LLLL) and weekday name of any length (e.g. E, EEEE, c, cccc), use Otherwise, if the requested date fields include wide month, use Otherwise, if the requested date fields include abbreviated month (MMM, LLL), use Otherwise use

As per the spec, this should end up using the dateFormatLength type long but both ICU4j and iOS end up using the dateFormatLength type short or medium (They are both identical so I am not sure which one is being picked)

cahuja commented 7 years ago

Also, all my tests are on CLDR 28

cahuja commented 7 years ago

This also happens for es-CL. That is enough information to suppose it is an issue for all locales.

cahuja commented 7 years ago

Another example of this helps narrow this down as an ICU4j bug is as follows - In the locale "fr-BE", for pattern "yyMdjm", we are combining the date and time patterns since the pattern is not available together. Now, in this case, per my reading of TR-35, we should fall through to using the dateTimeFormatLength = "short" pattern.

Globalize output -> 'd/M/yy HH:mm' ICU4j output -> 'd/M/yy à HH:mm'

This is after I overrode Globalize to pick up d/M/yy as the pattern for yyMd so please bear that in mind. The area to focus here is the separator between date and time values.

Per my reading of TR-35, Globalize should use the short pattern, which is what it is doing.

The medium, long and full pattern separators are all "{1} 'à' {0}". ICU4j is using one of these. Given the observations in the previous comments, I would hypothesize that ICU4j is always using medium dateTimeFormatLength value, making this an ICU4j issue. Thoughts?

(I am using CLDR 28, ICU4j 56 and Globalize 1.3)