elixir-gettext / gettext

Internationalization and localization support for Elixir.
https://hexdocs.pm/gettext
472 stars 87 forks source link

Missing support for translation context #114

Closed dannote closed 5 years ago

dannote commented 8 years ago

The original gettext library has pgettext, npgettext, dpgettext and dpngettext macros to provide different translations of the same phrases in different context. Those are very useful in case of there are short strings and their implementation is simple.

josevalim commented 8 years ago

@dannote We have by design decided to not include contexts because, even in the original implementation, they are simply shortcut macros that can be implemented by the developer. Also, if you want to split different translations, domains may be better used to solve such problems.

dannote commented 8 years ago

@josevalim AFAIK domains were designed to split translation files into modules and contexts were designed to store different translations for same phrases even within the same domain. For example, in a Phoenix application there are at least two separate domains - one for Ecto messages and one for UI. Imagine that we need two different translations for different models (that often happens for Russian). Should we then place each model into it's own domain?

josevalim commented 8 years ago

@dannote yes, I would use two different domains in this case. From what I understand about contexts, they are used to solve ambiguity arising from linguistic situations. For example, you can use "file" as a verb and as a noun in english, although that wouldn't work in other languages. It is a linguistic context rather than the application domain one. It could also be used when gender, adverbs and adjectives are involved. There is not a lot of information about gettext though, so I am not 100% sure.

whatyouhide commented 8 years ago

The idea is that you can have domains like ecto.verbs and ecto.nouns or whatnot. At the time we architected this library, me and @josevalim discussed this but concluded it was ok to not have them because they introduce substantial complexity for maybe not that much gain, and we decided to wait to see if people wanted this (or something like this). I think this is the first time this gets mentioned, right José?

dannote commented 8 years ago

I suppose that this feature won't bloat the library much. I can prepare PR for those four functions. Anyway, it would be nice to provide instructions for other developers who might require contexts.

@spec pgettext(module, binary, binary, bindings) :: binary
def pgettext(backend, context, msgid, bindings \\ %{}) do
  dgettext(backend, "default", "#{context}\u0004#{msgid}", bindings)
end
whatyouhide commented 8 years ago

The function you proposed requires people to write translations in PO files like that:

msgid "my_contextUNICODESTUFFmsgid"

which would not be optimal I think. What would then be the difference with doing dgettext(MyApp.Gettext, "default", "my_context.Hello world")?

josevalim commented 8 years ago

I think it is best to document for now this is possible instead of including it as part of gettext. As Andrea said, it is the first time this feature is requested, so I would wait a bit longer before making a commitment.

On Wednesday, August 17, 2016, Andrea Leopardi notifications@github.com wrote:

The function you proposed requires people to write translations in PO files like that:

msgid "my_contextUNICODESTUFFmsgid"

which would not be optimal I think. What would then be the difference with doing dgettext(MyApp.Gettext, "default", "my_context.Hello world")?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elixir-lang/gettext/issues/114#issuecomment-240448610, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAlbp83S6jiQMg47uee7i4OcYGIwHToks5qgyhNgaJpZM4Jk4GC .

José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D

dannote commented 8 years ago

@whatyouhide Sorry, that was a wrong suggestion. I thought that .PO files are compiled to .MO as in the original library. The parser should be modified to understand msgctxt instead.

whatyouhide commented 8 years ago

@dannote exactly, and that should be handled everywhere in Gettext. It would not be a small change, that's why me and José are trying to not do it until absolutely necessary (read: requested by several users). As someone once said, in open source, saying no is temporary, but saying yes is forever :) So let's wait to see if someone else is interested and maybe in the meantime you can try out the "rustic" strategy with simulating context similarly to how I showed in the example. And the docs, I'm gonna take care of those :)

whatyouhide commented 8 years ago

I just pushed https://github.com/elixir-lang/gettext/commit/5e65d6643749e387f155704e18b1eef4b8d9f1dc to master, which talks about "contexts" in the documentation for Gettext. @josevalim do you think it's enough? Also, do you think we should mention that if people want this they can talk to us and if enough people want this we can give it a shot? I'm afraid the current wording could make people feel bad to ask for this feature, but at the same time, saying "just ask if you want them" seems like we didn't do it out of lazyness and were just waiting for complaints :P Wdyt?

(PS: closing this issue for now, if you wander here because you want contexts in Gettext feel free to comment down here to keep the discussion going 😃)

stephenmoloney commented 8 years ago

@josevalim , @whatyouhide I've just started using gettext for the first time today... It's been great so far and I'd like to say thanks for doing a great job. :+1:

I've never used the .pot and gettext system before today. I speak Spanish reasonably well too so I started to do some translations and I very quickly noticed an issue with context... with translations context is important and I'd like to express a need for 2 separate features in this respect. The first one probably being much easier to implement than the second one.

  ## translation_comment: break the noun, A rest period during one's working day. 
  ## Not the verb to break. 
  msgid "break"
  msgstr ""
whatyouhide commented 8 years ago

Hey @stephenmoloney, thanks for pitching in (and for the kind words!). As of now, if you write

# This is a pot file and this is a comment.
msgid "foo"
msgstr ""

and run mix gettext.merge, you should get PO files with the comment in them:

# This is a pot file and this is a comment
msgid "foo"
msgstr ""

so this should already work?

The alternative would be something like https://github.com/elixir-lang/gettext/issues/83, where we attach comments from the developer to the translator, in the source code, alongside a translation.

stephenmoloney commented 8 years ago

@whatyouhide Thanks for the reply, Sorry, yes, the comments work. The only thing I did notice and can reproduce is that if I change a comment, the new updated comment doesn't get merged back in. Not sure why the comments didn't work in the first attempt but I can't reproduce that.. they do work.. just updating a comment is the only issue I can reproduce.

In .pot files:

# This is a pot file and this is a comment
msgid "foo"
msgstr ""

changed to

# Sorry wrong comment first time
msgid "foo"
msgstr ""

doesn't seem to update in the .po files

whatyouhide commented 8 years ago

@stephenmoloney can you open an issue for that? I'll look into it. :)

tmbb commented 7 years ago

I' like to second @dannote's request to support disambiguating contexts, as explained here. It's true that anything you can do with contexts you can also do using domains, by creating domains with names like "navbar.archive", "body_text.archive" or something like that, as @josevalim suggested.

The problem with this is that it creates multiple files per module, and restricts the context to the strings you'de choose for filenames. It doesn't allow you to write a freeform sentence like "Here 'S' is an abbreviation of Scope" (unless you want to deal with space in your filenames), as in the link above, or "Choose a sentence that contains all characters from the script used by your language" (also in the link above).

Personally, I don't actually need the full power of contexts, and I can get by just fine with domains. My motivation for wanting domains is to avoid the clutter of several domain files.

dgorshkov commented 5 years ago

I would like to resurface this - the lack of this feature recently effed us with an internal project at the company I currently work for. We assumed that this is a full port of the original gettext, and were surprised to say the least.

josevalim commented 5 years ago

Given the requests, we will gladly accept a PR that implements this. Thanks!

Wijnand commented 5 years ago

This would definitely help us as well. Due to external dependencies we can not use more than one file per language, but ensuring we can specify context for the translators we have to deal with using the standard solution provided by gettext sounds like the way to go.

Do you have any guesstimate about the complexity of adding this?

whatyouhide commented 5 years ago

@Wijnand we already parse the msgctxt but I'm not sure we store it. If we do, it's a matter of creating the new macros and properly extracting the context into the extracted translations. Shouldn't be a huge chunk of work.

Wijnand commented 5 years ago

Ok, thank you. I will take a quick look to see if I can do it. Now it basically sounds like the same thing needs to be done as for gettext_comment

whatyouhide commented 5 years ago

@Wijnand I don't think it would be the same as gettext_comment because IIRC GNU Gettext has a different set of functions that have the context as an additional argument. We might have to look into that. I'd appreciate a quick research into this if you get the chance :)

Zurga commented 5 years ago

This pull request should implement the needed functionality: #228 @Wijnand thank you for a pleasant day at Tell Charlie.

whatyouhide commented 5 years ago

Closed by #228.