elixir-cldr / cldr_collation

CLDR Collation
Other
4 stars 4 forks source link

Data provider for collation data with icu-collator #12

Open kipcole9 opened 1 year ago

kipcole9 commented 1 year ago

CLDR collations are configured per-locale (typically per-language in reality) in a set of configuration files. These files need to be available to icu-collator through its data provider interface.

Including the data files in ex_cldr_collation seems reasonable. They are not large files since they represent only tailorings of the standard DUCET collation.

Questions

  1. Does icu-collator depend on other CLDR data than these collation files?
  2. Do any of the existing data provider mechanisms in icu-collator support loading these files. And if so, how is that configured?

I'll see what I can learn from reading more of the rust docs but I'm in deep water when it comes to that so any suggestions you have would be warmly welcomed!

kipcole9 commented 1 year ago

Hmm it seems there must be some mechanism to process the collator data (and other CLDR data) into a rust format? This based upon https://github.com/unicode-org/icu4x/tree/main/provider/testdata/data/baked/collator/data_v1

Which would mean from a packaging point of view, processing all the collator locales and including them in the ex_cldr_collation lib would seem to make this workable. I can build a mix task to generate the data for each CLDR release so its somewhat automated.

foxbenjaminfox commented 1 year ago

I don't know if you saw the comment I left on #10; perhaps it wasn't a good idea to leave it as a comment on a closed PR. Either way: the core thing to look at I think is this guide, which describes how to use icu_datagen to generate the data for use with Rust's icu family of crates.