godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.14k stars 93 forks source link

Add support for Fluent translation system #8705

Open Vovkiv opened 9 months ago

Vovkiv commented 9 months ago

Describe the project you are working on

Any game/software that might rely on language/linguistic features, such as pluralization, gender, grammar, etc.

Currently, Godot supports adding translations via gettext or data storage formats such as csv. gettext provides some linguistic features such as pluralization (being able to specify pluralization formula and store pluralized variants of strings), documentation (such as comments extracting (that Godot currently not yet supports)), and gettext is standardized which means there exist many tools to work with it (such as offline and online translate tools, like weblate or POEditor). Data formats like csv doesn't provide anything like that on their own, but they usually easier to work with then gettext.

Describe the problem or limitation you are having in your project

None of this systems provide other linguistic features, such as gender (he, she, it, etc), QoL features, such as placeholder words in strings (https://projectfluent.org/fluent/guide/terms.html, that won't be visible at runtime, but will be useful for translators), they might "leak" language specific logic to other languages (for example, usually, when you want to implement gender logic into your game/software, you most likely will implement it using some if's and else's, but since different languages might have different amount of genders in them (or some even none?) that might means that you need to implement fallback logic in case your source language conflicts with translated, which might quickly become problematic) and they don't provide any functions that might be useful for translators (for example, fluent allow to expose to users functions that they might want to use https://projectfluent.org/fluent/guide/builtins.html).

Describe the feature / enhancement and how it helps to overcome the problem or limitation

On other hand, Fluent (https://projectfluent.org/, developed by Mozilla, Apache 2.0 license) does provide many features that gettext has and adds many other. For example, I was surprised on how EASY it is to implement gender cases in Fluent - developer need to specify gender as some variable and then you use this variable to define gender specific cases as you want! (You can even ignore gender if your language doesn't need it). (from main page of fluent web-site):

# Simple things are simple.
hello-user = Hello, {$userName}!

# Complex things are possible.
shared-photos =
    {$userName} {$photoCount ->
        [one] added a new photo
       *[other] added {$photoCount} new photos
    } to {$userGender ->
        [male] his stream
        [female] her stream
       *[other] their stream
    }.

That in Russian will become something like that:

# Simple things are simple.
hello-user = Привет, {$userName}!

# Complex things are possible.
shared-photos =
    {$userName} {$userGender ->
        [male] добавил
        [female] добавила
       *[other] добавили
    } {$photoCount ->
        [one] {$photoCount} новую фотографию
        [two] {$photoCount} новые фотографии
       *[other] {$photoCount} новых фотографий
    }.

So, adding gendered strings was as simple as adding single variable that was used in language file. And every language file have full freedom to implement their own cases as needed! To the point, where you can remove entire parts of sentence in some language without breaking anything and asking developer to support such cases in game/program code.

# Simple things are simple.
hello-user = Привет, {$userName}!

# Complex things are possible.
shared-photos =
    {$userName} {$userGender ->
        [male] добавил
        [female] добавила
       *[other] добавили
    }.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Proposal will work almost same way as currently works csv/gettext where you use same functions to deal with translations (for example, you use tr and tr_n which work same way (from user perspective) no matter if translations comes from gettext or csv files), but there might be some Fluent specific differences, for example, developer can provide variables to translation (https://projectfluent.org/fluent/guide/variables.html) that translator will use (as was shown in example above, we used variable to determine gender cases in strings), so new functions or new syntax might be required to implement Fluent.

If this enhancement will not be used often, can it be worked around with a few lines of script?

It's might be possible to implement Fluent as extension, but currently, there only 3 official implementation of Fluent: https://github.com/projectfluent In Rust, Python and javascript No C or C++, which means instead of simply binding, implementation should be written from scratch.

Is there a reason why this should be core and not an add-on in the asset library?

Fluent provide much more superior localization features then csv files and gettext, which might mean that gettext and csv translation systems might even get replaced with Fluent in future, if it become widely used and supported enough.

But let's talk more about possible downsides and caveats.

What everyone's opinion on Fluent? Does it looks promising? Does it is worth it?

Vovkiv commented 9 months ago

Also, slightly offtopic, I didn't dig that deep into Fluent, but does it's standard/reference includes gettext-ish features? Like being able to extract text to translate from source files with all relevant information into language file and features that gettext has, like comments or "fuzzy" strings? Or merging together pot file with po file? (To update translation to latest source text)

Kehom commented 9 months ago

So a few days ago I started to research into localization/translation systems. I had no idea I would fall into such a rabbit hole!

During the research I found out some solutions that are newer than gettext, such as Fluent (as mentioned in the original post), Flutter's ARB (https://docs.flutter.dev/ui/accessibility-and-internationalization/internationalization), i18next (https://www.i18next.com/) and so on. What I have noticed is that those newer solutions are "argument based", which shifts some of the translation logic from the application's code into the translation itself. This is actually a good thing since some languages don't need the additional logic, specially when dealing with pronouns.

The gettext solution offers "just" the pluralization thing. It's absolutely enough for UI elements such buttons, menus and so on. It might be enough for tooltips. However it does show limitations when dealing with more text, such as character dialogues. But then, even for simpler texts it can be limited. Let's consider a rather contrived example. Suppose a multiplayer game in which after the match we want to display a message similar to:

collected [count] gem(s) and contributed with [other_value] point(s) to <his/her/their> team.

While it would be possible to deal with something like this with gettext, this means the game's code requires additional logic to allow it. This also means the programmer needs prior knowledge of all the requirements to allow proper translations. Some of the text would probably even result in multiple "keys".

In a solution like Fluent (or any other "argument based") the base text would end up with a single key and the translation system would take care of the formatting based on incoming arguments. This is not perfect because the programmer still needs to know which arguments must be sent to the translation function. Yet I believe it's more robust than gettext's way.

All that said, I did consider working on an Extension that would offer translations based on arguments, however the TranslationServer doesn't support such type of system. I believe that it would be possible to expand a little bit the current system without breaking compatibility while also opening the doors for any kind of solution that would be based on providing arguments to obtain a translated message.

The first part of the idea is to add a new virtual function into the Translation resource. Maybe get_formatted_message(StringName src_message, Dictionary args) or format_message(StringName translated, Dictionary args).

In either case the returned value would be the translated message with the arguments applied into it. The TranslationServer would then need a new function to retrieve the translated and formatted message from the appropriate Translation instance. Maybe the function can be named translate_and_format(StringName message, Dictionary args).

Finally, to make things easier, the Oject could receive a new function trf(StringName message, Dictionary args) just so it becomes in line with current tr() and tr_n() functions.

Now notice that this is just to allow Godot's translation system to support more modern solutions, which would include Fluent. Unfortunately none of the solutions mentioned here have C or C++ implementations, all of them are web focused. But then, with the current TranslationServer we couldn't consider even a custom system that relies on arguments.

By the way, if idea is approved I could submit a PR with such additional functionality.

RedMser commented 5 months ago

Hey, I've created a GDExtension using the Rust bindings of Fluent: https://github.com/RedMser/godot-fluent-translation

A custom build of Godot is recommended, since it allowed for breaking API changes (unlike the above comment, I added a new parameter to atr and tr for the args). But you can also use default engine builds, at the cost of a clunky API. See the readme for more details!

It also still lacks some helpful features, so any help is definitely welcome.

Also see this WIP implementation of Fluent in C++: https://gitlab.com/bmwinger/fluent-cpp

HauntedBees commented 4 months ago

Just recently I was writing some dynamic strings in my project and wrote something along the lines of "{partyMember} attacked {enemy} for {damage} damage, and recovered {health}HP from the attack." Both damage and health should handle pluralization, and partyMember and enemy need to handle gender. gettext works fine for a single pluralizable variable in a given string, and you can basically fake gender support for a single variable using msgctxt, but it'd be a bit of a chore to split that whole string into multiple parts to handle gender and pluralization for each case using the gettext format.

I just discovered Fluent while I was looking into resolutions for this and was pleased to find this proposal already exists. Additionally I've recently become familiar with messageformat, which would also solve these problems, but I will not claim to know which would be easier to get working with Godot either as a GDExtension or engine addition, nor which is more widely supported in general.

I do think at the very least, changes to the engine should be made to support a GDExtension implementation of either format (as the GDExtension mentioned in the previous comment still requires a custom Godot build). As convenient as it'd be to have support for either format built into Godot, gettext is definitely the biggest name in this field and the oldest, so I'd understand not having either built into the engine outright.