getkirby / ideas

This is the backlog of ideas and feature requests from the last two years. Use our new feedback platform to post your new ideas or vote on existing ideas.
https://feedback.getkirby.com
20 stars 0 forks source link

i18n: improve translation classes #386

Open hdodov opened 5 years ago

hdodov commented 5 years ago

If you open kirby/i18n/translations/en.json you can see:

  "error.section.files.max.plural":
    "You must not add more than {max} files to the \"{section}\" section",
  "error.section.files.max.singular":
    "You must not add more than one file to the \"{section}\" section",

So the internal Kirby translations use two separate keys for singular and plural. However, I18n::translateCount(), which is used by the tc() helper, is implemented differently:

Then, the I18n::template() just above it again uses single curly braces.


My proposal is to translate strings according to the Unicode CLDR with associative arrays keys for each of the 6 plural forms - zero, one, two, few, many, and other. This would also ensure that languages with more than 2 plural forms are translated correctly. I've made a php-pluralization package that does exactly that. It doesn't support a lot of languages yet, but I can add whatever languages you need. On top of that, it can also handle the differences between cardinal and ordinal plurals, since they use different rules. I have a plugin where you can see how this package can help.

So the core translations can be represented as:

"error.section.files.max": {
  "one": "You must not add more than one file to the \"{section}\" section",
  "other": "You must not add more than {max} files to the \"{section}\" section"
}

And in other languages, you can easily add forms for few or many, for example. The translateCount() function can also benefit from this and give that functionality to developers as well, essentially making tc() function like the tp() function my plugin provides.


As for the curly braces, I think we should use two because that's the official mustache syntax and many people are going to be familiar with it.

lukasbestle commented 5 years ago

That makes a lot of sense semantically and I also think that we should have proper support for all possible pluralization forms in case any language will need it. It's the same effort for us effectively if we switch to it once.

The issue is: I'm not sure if Transifex supports nested objects for translation variables. We import and export the JSON files into and from Transifex, so it needs to deal with the structure correctly.

bastianallgeier commented 5 years ago

Unfortunately Transifex doesn't support nested values, but we could still use dots for this.

error.section.files.max.one error.section.files.max.other

I think we even do this somewhere.

hdodov commented 5 years ago

@bastianallgeier this relates to #387, where I propose the values to be deflated for this exact reason - it won't matter if you specify the translations like this:

{
  "error.section.files.max": {
    "one": "You must not add more than one file to the \"{section}\" section",
    "other": "You must not add more than {max} files to the \"{section}\" section"
  }
}

...or like this:

{
  "error.section.files.max.one": "You must not add more than one file to the \"{section}\" section",
  "error.section.files.max.other": "You must not add more than {max} files to the \"{section}\" section"
}

...because the first one is always converted to the second one. This would allow you to define translations however you want, or in the case with Transifex - however you can. It also helps plugins because they know what the final form of those translations will be.


As for the current issue, do you think that using the Unicode plural rules would be good? I'm not sure if you're willing to add another dependency just for that, but the package I've made turned out pretty great, it's lightweight, and it's well tested. I could add support for any language you need.

On one hand, that's another dependency, on the other hand:

I think the use cases for this functionality are pretty common - paginations, shopping carts, etc.

Right now, translateCount() can only help with languages who use English-like rules and only for cardinal numbers. My package can also accept strings and handle tricky edge cases. For example, in Russian, there's a difference between100 and 100.0

lukasbestle commented 5 years ago

Unfortunately Transifex doesn't support nested values, but we could still use dots for this.

But then all of those strings would separately pop up in the Transifex editor and the translators would need to copy over the text all the time. I have seen translation editors that have a special UI for this that displays and edits all variants together.

As for the current issue, do you think that using the Unicode plural rules would be good?

We have experimented with the ICU MessageFormat format (which is supported natively in PHP if the Intl extension is installed) in the past. It's pretty advanced and supports a lot of this natively. Back then we decided not to use it for site translations because of its complexity, but maybe we should use it for core translations after all.

hdodov commented 5 years ago

I didn't know about the ICU Message Format. It seems it solves all problems, but it would be very tough for editors to use, I think. There's a whole syntax you need to learn.

But then all of those strings would separately pop up in the Transifex editor and the translators would need to copy over the text all the time.

I've never used Transifex, but I'm working with Memsource for one project and it's pretty solid. It has fancy stuff like translation memories that allow you to freely change the structure of your content without having to translate it or manually copy it - it's automatically loaded. You can maybe check it out, it has a free plan.

lukasbestle commented 5 years ago

but it would be very tough for editors to use, I think. There's a whole syntax you need to learn.

The most common parts are actually pretty simple. It only gets tough once you actually need all those formatting details in your specific language. If we do decide to switch to it, we could write a short 101 tutorial with the most important information.

Only issue with it: It needs the Intl extension. I'm not sure if we already use other functions of it that are required for using Kirby (we do use the IDN functions, but those are only needed for installations on IDN domains). If not, the question is if it's worth it.

It has fancy stuff like translation memories that allow you to freely change the structure of your content without having to translate it or manually copy it - it's automatically loaded.

Transifex has that as well, but I imagine it could still be a bit tedious if we suddenly have six versions of each translation string.

hdodov commented 5 years ago

I see. Is it an option to have Kirby rely on the Intl extension and when not present, display some discrete warning and use a simple fallback logic (e.g. the current translateCount())? Worst case scenario would be to have some incorrect panel translations (since the core would rely on Intl too) on languages that don't use English-like plural forms (since translateCount() handles only those). But for sites like that, the language itself is already an incentive to use the Intl extension, so...

I think Intl can be worth it because it would be useful in cases other than I18n, i.e. even for single-language sites.

lukasbestle commented 5 years ago

No, I think if we do it, Intl should be a required extension. Otherwise there will be all sorts of inconsistencies in the behavior.