elixir-gettext / gettext

Internationalization and localization support for Elixir.
https://hexdocs.pm/gettext
461 stars 87 forks source link

Supply plurals forms header to Gettext.Plural callbacks #343

Closed maennchen closed 1 year ago

maennchen commented 1 year ago

Closes #318

This makes it possible to react precisely to the header instead of using predefined formats.

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 14a44fbc30c747d29915c4405a3b909ad4c00227-PR-343

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details


Files with Coverage Reduction New Missed Lines %
lib/gettext/merger.ex 1 89.36%
lib/mix/tasks/gettext.merge.ex 1 98.51%
lib/gettext/plural.ex 16 83.5%
<!-- Total: 18 -->
Totals Coverage Status
Change from base Build 2ebe8066d6309ea2d810beb93e38433fef6f5071: 0.2%
Covered Lines: 566
Relevant Lines: 631

💛 - Coveralls
whatyouhide commented 1 year ago

shift work into compile time to possibly improve the performance

If we don't know if this improves performance, I don't think we should do it for now. Let's solve it if it ever becomes a problem?

whatyouhide commented 1 year ago

@maennchen also, can you elaborate on the purpose of this PR in more details? I'm not really following 😄

maennchen commented 1 year ago

@whatyouhide

If we don't know if this improves performance, I don't think we should do it for now. Let's solve it if it ever becomes a problem?

That highly depends on the implementation of that behaviour. I know that I can make it faster in ExpoPluralForms.

That description should probably be worded differently.

@maennchen also, can you elaborate on the purpose of this PR in more details? I'm not really following smile

Sure :)

The purpose of this PR is that the plural form header is passed into Gettext.Plural implementations.

Therefore implementations can make decisions based on that header.

This will allow it to implement an ExpoPluralForms implementation of the behaviour.

Current behaviour

Gettext has a list of known locales for which plural is implemented. Since that list identifies rule by the locale name, the index is looked up by locale & count.

ExpoPluralForms behaviour

For various languages, there's not a single way to represent plural rules.

For example for chinese there's two different rules depending on the context. (nplurals=1; plural=0; & nplurals=2; plural=(n > 1);)

Also in my own language (swiss german) there's some debate of how to handle 0 counts and depending on the dialect, either new singular case (0) or many (1) applies.

Those rules & differences can be represented using the plural forms header.

By supplying the plural forms header to the callbacks, the implementation can make decisions as they were intended in gettext.

maennchen commented 1 year ago

@whatyouhide Performance comparison (extended from https://github.com/elixir-gettext/expo_plural_forms/pull/33#issuecomment-1383212655) to illustrate what would happen if the header was parsed on every index call:

Mix.install([{:benchee, "~> 1.1"}, {:expo_plural_forms, path: "."}])

{:ok, arabic} = ExpoPluralForms.Known.plural_form("ar")

defmodule Arabic do
  def index(n), do: unquote(ExpoPluralForms.compile_index(arabic))
end

Benchee.run(
  %{
    "precompiled" => fn ->
      Arabic.index(17)
    end,
    "without" => fn ->
      ExpoPluralForms.index(arabic, 17)
    end,
    "including parse" => fn ->
      {:ok, arabic} = ExpoPluralForms.parse(" nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);")
      ExpoPluralForms.index(arabic, 17)
    end
  },
  time: 10,
  memory_time: 2
)

Result:

Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 46.77 GB
Elixir 1.14.2
Erlang 25.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s

Benchmarking including parse ...
Benchmarking precompiled ...
Benchmarking without ...

Name                      ips        average  deviation         median         99th %
precompiled           12.75 M       78.44 ns ±11373.62%          72 ns         100 ns
without                4.89 M      204.65 ns ±17541.84%         173 ns         348 ns
including parse        0.35 M     2833.62 ns   ±846.85%        2546 ns        4091 ns

Comparison: 
precompiled           12.75 M
without                4.89 M - 2.61x slower +126.21 ns
including parse        0.35 M - 36.12x slower +2755.17 ns

Memory usage statistics:

Name               Memory usage
precompiled                 0 B
without                     0 B - 1.00x memory usage +0 B
including parse          9928 B - ∞ x memory usage +9928 B

**All measurements for memory usage were the same**
josevalim commented 1 year ago

By supplying the plural forms header to the callbacks, the implementation can make decisions as they were intended in gettext.

I think this is approaching the problem based on the current solution but, given the limitations you added, I would say the problem needs to be reworked altogether. Likely we want to compile the rules encoded in the header as part of the backend compilation?

When you say the PluralForms header, is that a header for the .po file or specific to a given translation/msgid?

maennchen commented 1 year ago

@josevalim The plural forms header is in the .po header and not per message.

The idea with compile_plural was to compile the rule once per backend and call it from there.

Based on the discussion over in ExpoPluralForms (performance), I think it would make more sense to go a plug like approach. There could be some kind of init callback to do preparation works like header parsing and then a normal function call with the init result to calculate the index.

I'll put the PR back into draft state and implement a cleaner solution.

josevalim commented 1 year ago

For example for chinese there's two different rules depending on the context. (nplurals=1; plural=0; & nplurals=2; plural=(n > 1);)

and how do you determine which one to use? :)

maennchen commented 1 year ago

@josevalim The user of the library can decide which rule to use when he sets the header in the .po file.

Many tools like Poedit help a user setting up the headers as well.

We have an example of that here in the source: https://github.com/elixir-gettext/gettext/blob/2ebe8066d6309ea2d810beb93e38433fef6f5071/test/fixtures/po_editors/poedit.po#L13

whatyouhide commented 1 year ago

Yeah, I think the Plural-Form header in each PO file is pretty neat. I think compiling the Plural-Forms header once is a no-brainer, because we can parse it into an Elixir data structure really easily. I think "precompiled" vs "without" in your benchmarks show that "without" is probably a good compromise between simplicity and speed.

maennchen commented 1 year ago

@josevalim / @whatyouhide I pushed a new version that is a lot simpler and does not cause any changes in the behaviour. Only a new callback is added.

maennchen commented 1 year ago

Example implementation: https://github.com/elixir-gettext/expo_plural_forms/pull/36

EDIT: Moved to https://github.com/elixir-gettext/gettext/compare/jm/extend_plural_interface...jm/expo-example

maennchen commented 1 year ago

@whatyouhide I've adressed all the comments except https://github.com/elixir-gettext/gettext/pull/343#discussion_r1081221654

Are we sure that we handle the deprecation question correctly? I have removed it but find it a bit strange to keep code (and not even discourage using it) when the behaviour it causes is potentially flawed.

maennchen commented 1 year ago

@whatyouhide / @josevalim Quick note: I will have some surgery on my wrist next week. I will probably not have time to finish the Gettext.Plural implementation for expo until then. I however plan to pick it back up 2/3 weeks later when I can hopefully get rid of the cast and properly type again.

whatyouhide commented 1 year ago

@maennchen no worries! I have some free time, so I can give it a shot if you want. I'm working on polishing docs as well. Thanks for all the work!! 💟

maennchen commented 1 year ago

@whatyouhide If you’d like to, I don’t mind if you pick it up from here. But otherwise I’m also fine doing myself in a few weeks. However you prefer 😊 I might be able to do some code reviews, so just let me know if I can help.