WeblateOrg / weblate

Web based localization tool with tight version control integration.
https://weblate.org/
GNU General Public License v3.0
4.52k stars 998 forks source link

Plural support in JSON files [$50 awarded] #1572

Closed wichert closed 7 years ago

wichert commented 7 years ago

I find myself needing plural support in JSON format. I am pretty flexible about which toolkit to use, so I don't quite care if that uses i18next or MessageFormat format.

The i18next syntax is the simplest of the two and looks like this:

{
   "key": "You have {{count}} item.",
   "key_plural": "You have {{count}} items"
}

For languages with multiple plural forms the key format is key_# for fixed-number strings, and key_plural for other numbers.

The ICU MessageFormat format is more powerful, and thus more complex. While it is interesting, I expect it would require a massive UI change to support properly.

@nijel I'm happy to set a bounty for this. Can you give an indication of what would be needed?

wichert commented 7 years ago

Thinking about this a bit more I don't think a UI to unpack all ICU MessageFormat options makes sense. It is probably better to show the message as-is, and require translators to understand the format. Combined with a verifier to detect transform structure differences (perhaps based on pyseeyou ? ) that can be workable.

That suggests two possible paths:

  1. Support i18next plural support
  2. Add an ICU MessageFormat verifier
nijel commented 7 years ago

Having support for i18next seems easily doable, but I have one remark to that: This is not a correct way to handle plurals, many languages have more complex rules than just having singular and plural. So when you consider starting with plurals I'd suggest using something what can handle this.

ICU MessageFormat looks quite close to what l20n is doing and it's turning translators to a programmers because with all features the translations can get quite complex. On the other side I think most of the features will not find much use...

wichert commented 7 years ago

The i18next format does support more than just singular and plural. Here is the example for Arabic with five plural forms:

{
  "key_0": "zero",
  "key_1": "singular",
  "key_2": "two",
  "key_3": "few",
  "key_4": "many",
  "key_5": "plural"
}

There is a test tool on the i18next plural page that will dump the keys for each language for you.

From a purity point of view I do like the ICU MessageFormat for its accuracy, but you are completely correct that it forces translators to become programmers which is likely to lead to problems. Especially when you have to use translation agencies who use their own tooling and people who you can't train properly. From a more practical point of view the i18next format seems a lot easier to manage, and I highly doubt I'll ever need the extra flexibility the ICU MessageFormat offers.

nijel commented 7 years ago

Ah, I didn't notice this. So in the source there will be key and key_plural, while in translations key_[0-5]? That looks pretty confusing, but can be handled...

wichert commented 7 years ago

I think that is correct.

Would setting a bounty on this help getting it in the 2.16 release?

nijel commented 7 years ago

The implementation would (probably) be in the translate-toolkit only so it's more depending on their release schedule. As for Weblate 2.16, it will be pretty much feature frozen today...

wichert commented 7 years ago

I'm still waiting for them to make a release with the nested JSON support as well :). Still, if a bounty would result in this being implemented for translate-toolkit soon we can temporarily use a private release as well.

nijel commented 7 years ago

I can't promise much from my side in next three weeks (next week I have vacation and then I'm at DebConf starting with 4th August).

wichert commented 7 years ago

I've added a bounty to this ticket.

nijel commented 7 years ago

The pull request on translate-toolkit is here:

https://github.com/translate/translate/pull/3678

As usual, Weblate will need just small glue, which I will add soon.

ghost commented 6 years ago

@nijel I've noticed that exporting into i18next JSON files is not working properly. This works the same when code is being pushed to repository.

Each form instead of having separate key is put into subarray under original key. Subarray has ordinal keys.

So my initial english

{
  "hello": "Hello",
  "apple": "I have an apple",
  "apple_plural": "I have {{count}} apples",
  "apple_negative": "I have no apples"
}

in polish exports to

{
  "hello": "Witaj",
  "apple": [
    "Mam jabłko",
    "Mam {{count}} jabłka",
    "Mam {{count}} jabłek"
  ],
  "apple_negative": "Nie mam jabłek"
}

instead of

{
  "hello": "Witaj",
  "apple": "Mam jabłko",
  "apple_1": "Mam {{count}} jabłka",
  "apple_2": "Mam {{count}} jabłek",
  "apple_negative": "Nie mam jabłek"
}

is it possible to fix it ? or could you point me into source so I could try working on it myself, although I'm not familiar with python.

nijel commented 6 years ago

I can't reproduce this. I've just added test to translate-toolkit to cover this scenario and it works as expected: https://github.com/translate/translate/pull/3748

Anyway, can you please open new issue for that so that we don't mess up existing one?

ghost commented 6 years ago

Thanks for quick answer. Sorry, yes, I've created new issue #1701.