Open kipcole9 opened 1 year ago
Hmm it seems there must be some mechanism to process the collator data (and other CLDR data) into a rust format? This based upon https://github.com/unicode-org/icu4x/tree/main/provider/testdata/data/baked/collator/data_v1
Which would mean from a packaging point of view, processing all the collator locales and including them in the ex_cldr_collation
lib would seem to make this workable. I can build a mix task to generate the data for each CLDR release so its somewhat automated.
I don't know if you saw the comment I left on #10; perhaps it wasn't a good idea to leave it as a comment on a closed PR. Either way: the core thing to look at I think is this guide, which describes how to use icu_datagen
to generate the data for use with Rust's icu
family of crates.
CLDR collations are configured per-locale (typically per-language in reality) in a set of configuration files. These files need to be available to
icu-collator
through its data provider interface.Including the data files in
ex_cldr_collation
seems reasonable. They are not large files since they represent only tailorings of the standard DUCET collation.Questions
icu-collator
depend on other CLDR data than these collation files?icu-collator
support loading these files. And if so, how is that configured?I'll see what I can learn from reading more of the rust docs but I'm in deep water when it comes to that so any suggestions you have would be warmly welcomed!