elixir-gettext / gettext

Internationalization and localization support for Elixir.
https://hexdocs.pm/gettext
469 stars 86 forks source link

Duplicate msgid with singular and plural form #366

Closed saveman71 closed 11 months ago

saveman71 commented 1 year ago

Not sure if it's a bug / how other implementations tackle that, but msgfmt will fail for duplicate msgids:

❯ msgfmt -c priv/gettext/default.pot
priv/gettext/default.pot:10: duplicate message definition...
priv/gettext/default.pot:5: ...this is the location of the first definition
msgfmt: priv/gettext/default.pot: warning: PO file header missing or invalid
                                  warning: charset conversion will not work
msgfmt: found 1 fatal error

Which is expected for cases like:

#, elixir-autogen, elixir-format
msgid "Singular."
msgstr ""

#, elixir-autogen, elixir-format
msgid "Singular."
msgstr ""

However, it also fails for cases like:

#, elixir-autogen, elixir-format
msgid "Singular."
msgid_plural "Plural."
msgstr[0] ""
msgstr[1] ""

#, elixir-autogen, elixir-format
msgid "Singular."
msgstr ""

Generated by this code:

defmodule GettextMwe do
  import GettextMwe.Gettext

  @moduledoc false

  def singular do
    gettext("Singular.")
  end

  def plural(count) do
    ngettext("Singular.", "Plural.", count)
  end
end

I believe things should work fine if the singular form was dropped from the po/pot file.

Here's the repro repo: https://github.com/saveman71/elixir-gettext-duplicate-keys

Is this a bug, or an accepted behaviour and msgfmt is just being too noisy/plain wrong here?

The issue I can see is that usually both translations aren't always used in the same context, so "merging" them could be a breaking change.

msgfmt allowed me to remove actual duplicates, but now its output is only this case, so I can't move forward adding it as a new CI step.

whatyouhide commented 1 year ago

We can't merge the translations yeah. I mean, the Elixir Gettext behavior seems fine to me because the singular and plural translations are treated completely differently, so they can coexist with the same msgid.

Is msgfmt the only thing that fails on this in the GNU Gettext toolset? Also, do you know what GNU Gettext does when extracting in situations when this would happen? For example, I don't know how PHP's ngettext or other implementations deal with this.

maennchen commented 1 year ago

@whatyouhide PHP reads the msgid of a plural translation as well:

https://github.com/php/php-src/blob/9e15910889cfa1f14169ed9babf3bf54426cca1c/ext/gettext/tests/gettext_dgettext.phpt#L23

var_dump(gettext('item'));
[...]
string(7) "Produkt"

https://github.com/php/php-src/blob/afffcaeb4d994be50c83ac0204b4d4be8af560d9/ext/gettext/tests/locale/en_US.UTF-8/LC_MESSAGES/dngettextTest.po#L1

msgid "item"
msgid_plural "items"
msgstr[0] "Produkt"
msgstr[1] "Produkte"
maennchen commented 1 year ago

Judging by this paragraph, I think that is the expected way to do things:

https://www.gnu.org/software/gettext/manual/gettext.html#Invoking-the-msguniq-Program

The msguniq program unifies duplicate translations in a translation catalog. It finds duplicate translations of the same message ID. Such duplicates are invalid input for other programs like msgfmt, msgmerge or msgcat. By default, duplicates are merged together. [...]

So I guess we should merge singular and plural into one plural message and use the msgid of plural messages for singular calls.

whatyouhide commented 1 year ago

Yeah I think that's a good idea. I will be out all of next week so it will be a while before I'll be able to do anything about this 🙃

saveman71 commented 1 year ago

Sorry I was out of office and not great at using GitHub's notification center :sweat_smile: Thanks for looking into this!