Mapping CLDR data to .NET's languagecode2-country/regioncode2

notself commented 10 years ago

Hi, I'm fairly new to this CLDR world and to this new implementation of Globalize, so apologies if this doesn't belong here or if its too obvious.

I'm looking to build (merge) packs of JSON CLDR data for each culture as represented in .NET CultureInfo.Name. For example, I'd like to build files like:

cldr.en.json
cldr.en-GB.json
cldr.en-US.json
cldr.it-IT.json

The reason is that we already store the User's culture code in his profile in this format and I'd like to now conditionally include one CLDR file for each culture.

Is this the right approach? Looking at the json structure, I don't see the en-US or any of the other regions. I'm still at the point where the documentation is too cryptic, so any pointers would be really appreciated.

Thanks

rxaviers commented 10 years ago

Hi @notself, thanks for your interest.

Which CLDR parts do you want to merge? I guess this isn't clear for you, right? The fact is we owe you better documentation on that regard. We should let you know which CLDR parts you need depending of which Globalize modules you use. This has also been reported on #206, and should be fixed by #224.

Also note that some CLDR content is locale independent, eg. the supplemental ones. So, you may want to merge the locale dependent parts only (eg. main/en/numbers.json + main/en/ca-gregorian.json if you use date and number format for example), while keeping the locale independent parts merged into a separate file (eg. supplemental/likelySubtags.json + supplemental/weekData.json, ...) unless you don't care of sending duplicate bytes.

About the missing locales, where are looking for them at? http://www.unicode.org/Public/cldr/latest/json.zip? Another piece of info worth knowing is that: you should use en for en-US (en = en-US = en-Latn-US), and it for it-IT (it = it-IT = it-Latn-IT).

I will leave this issue opened until #224 is fixed. I will also let you know when the fix is about to land, so you can review it. If you still have any questions please just let me know.

notself commented 10 years ago

Thanks @rxaviers, after a few hours of poking around I think I got a better understanding, and your reply helps confirming things.

For now I was trying to get the bare minimum for date and number formatting and my first issue was how exactly should this be merged. I ended up hacking grunt-merge-json task to use lodash's _.merge (for a deep merge) on the CLDR files I was interested. This allows me to to easily build language bundles..

Another piece of info worth knowing is that: you should use en for en-US (en = en-US = en-Latn-US), and it for it-IT (it = it-IT = it-Latn-IT).

This is where I'm a bit confused. I'm creating a en-US file by merging (in this order):

supplemental/likelySubtags.json
supplemental/timeData.json
supplemental/weekData.json
main/en/ca-gregorian.json
main/en/numbers.json

How should I create a en-GB equivalent? Take all the en-US above and add the following files?

main/en-GB/ca-gregorian.json
main/en-GB/numbers.json

or should I replace the en with the en-GB equivalents?

And slightly on the same topic, how should we deal with fall-back languages? I see that this library throws when some format is not available. Is this something left to the developer to always ensure all formats are always present (perhaps by always including en?) or is there any standard way to deal with this?

Thanks for your effort in these libraries, its much appreciated!

rxaviers commented 10 years ago

How should I create a en-GB equivalent?

Short answer: you should replace the en with the en-GB equivalents.

Long answer: There are resolved and unresolved CLDR. When using resolved CLDR, which is the case when you download this, you never worry about locale fallback. When using unresolved CLDR, which you can generate yourself using the converter tool (which converts from LDML to JSON), you would need to load en-GB + en + root. For more info see http://www.unicode.org/reports/tr35/#Locale_Inheritance.

how should we deal with fall-back languages?

Are talking about http://www.unicode.org/reports/tr35/#LanguageMatching? This is not yet implemented on Globalize. But, we plan to.

I see that this library throws when some format is not available. Is this something left to the developer to always ensure all formats are always present (perhaps by always including en?) or is there any standard way to deal with this?

Sorry, I am confused whether you are talking about format patterns not yet implemented (eg. timezone patterns on date format), or if you are talking about missing CLDR data.

We are throwing error messages for the not supported (or not yet implemented) format patterns.

If you are talking about how the library deals with missing CLDR content, it's left to the developer (or, the way I like to name it, "end application") to always load the necessary CLDR content. Note, though, that we plan to throw better error messages to acknowledge developers in case of missing content.

If you are talking about something else, please just let me know.

PS:

This is where I'm a bit confused. I'm creating a en-US file by merging (in this order):

The order shouldn't matter.

hacking grunt-merge-json task to use lodash's _.merge (for a deep merge).

Test if merging ( { a: { b: 1, c: 2 } }, { a: { b: 3, d: 4 } } ) gives you { a: { b: 3, c: 2, d: 4 } }, or check this out https://github.com/rxaviers/cldr/blob/master/src/util/json/merge.js.

notself commented 10 years ago

Thanks for your excellent reply, things are starting to make sense now!

It wasn't clear to me what unresolved and resolved meant in this context. but now I see how it relates with the fallback mechanism I was refering to. I was wondering what the root was and whether we should include it by default, but since I'm using resolved data, it will already be there, and all my "end application" needs to do is to ensure that there is always one language loaded.

Test if merging ( { a: { b: 1, c: 2 } }, { a: { b: 3, d: 4 } } ) gives you { a: { b: 3, c: 2, d: 4 } }

Yep, that's what I'm doing.

Thanks for your help.

rxaviers commented 10 years ago

@notself, it's a please to know my comments could be of some help.

The resolved vs. unresolved is a concept we had trouble introducing in our README, because it leads users to even more questions than answers. But, yeap using the JSON files you're good to go.

I have updated README (under PR #224). Hopefully, it should improve Getting Started compared to the previous README.

You can see the new README here.

Please, just let me know what you think of it and what else we could do to improve it by adding your comments onto PR #224.

Thanks!

globalizejs / globalize

Mapping CLDR data to .NET's languagecode2-country/regioncode2 #225