ajoslin / nanotranslate

Translate with pluralization and variables in under 600b.
MIT License
36 stars 0 forks source link

Complex plurals #1

Open gertsonderby opened 7 years ago

gertsonderby commented 7 years ago

A number of languages have more complex plural forms than English. An example is Czech:

1 horse => 1 kůň 2 horses => 2 koně 5 horses => 5 koní

See also http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html

Adding support for this would enable nanotranslate to support many more languages.

ajoslin commented 7 years ago

Alright. I think we can support it. It might be as simple as changing plural definitions from expecting an array to expecting an object.

Then the only rule becomes that data.count is used to index the object given at a key, and '*' is used as a fallback.

SOME_KEY: {
  1: '1 kůň',
  2: '2 koně',
  5: '5 koní',
  '5-10': 'Represent the range of 5-10',
  '*': 'everything else'
}

That way it would be a lot easier to define a translation for arbitrary counts.

Does that sound like it would work?

examples:

translate('SOME_KEY', {count: 1})
translate('SOME_KEY', {count: '5-10'})
translate('SOME_KEY', {count: 91291})
gertsonderby commented 7 years ago

If you look at CLDR (which is really a fantastic resource for localization), they've structured it around keywords, so in the Czech case, one, few, many, and other. So for English, one and other are the only ones used - There's only a 'more than one' plural. But in Czech, you have a plural form for 2-4 items, and another for thousands, so here they add the few and many keywords.

Take a look at Arabic, BTW, in the link I gave, if you want your head to pop. Not only do they have a special form for 0 items, they have forms for 2, 3-10, and for large numbers. And then they also have a distinct grammatic form for every possible combination of numbers in a range.

Natural languages, man.

ajoslin commented 7 years ago

Ah, I see. Interesting... So we have to support defining a translation for any arbitrary range.

I think that could be done, but it would put us closer to 1kb (well worth it of course!)

ajoslin commented 7 years ago

Opened a PR, check it out. Still below 800b.

Will clean up / merge tomorrow.

gertsonderby commented 7 years ago

That looks pretty awesome, and flexible. Sure, you'd need to set up your language yourself, but... I'm kinda wondering if you couldn't automate that with a build time utility of some kind and the CLDR data set.