Closed maennchen closed 1 year ago
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
lib/gettext/merger.ex | 1 | 89.36% | ||
lib/mix/tasks/gettext.merge.ex | 1 | 98.51% | ||
lib/gettext/plural.ex | 16 | 83.5% | ||
<!-- | Total: | 18 | --> |
Totals | |
---|---|
Change from base Build 2ebe8066d6309ea2d810beb93e38433fef6f5071: | 0.2% |
Covered Lines: | 566 |
Relevant Lines: | 631 |
shift work into compile time to possibly improve the performance
If we don't know if this improves performance, I don't think we should do it for now. Let's solve it if it ever becomes a problem?
@maennchen also, can you elaborate on the purpose of this PR in more details? I'm not really following 😄
@whatyouhide
If we don't know if this improves performance, I don't think we should do it for now. Let's solve it if it ever becomes a problem?
That highly depends on the implementation of that behaviour. I know that I can make it faster in ExpoPluralForms.
That description should probably be worded differently.
@maennchen also, can you elaborate on the purpose of this PR in more details? I'm not really following smile
Sure :)
The purpose of this PR is that the plural form header is passed into Gettext.Plural
implementations.
Therefore implementations can make decisions based on that header.
This will allow it to implement an ExpoPluralForms implementation of the behaviour.
Gettext has a list of known locales for which plural
is implemented. Since that list identifies rule by the locale name, the index is looked up by locale
& count
.
For various languages, there's not a single way to represent plural rules.
For example for chinese there's two different rules depending on the context. (nplurals=1; plural=0;
& nplurals=2; plural=(n > 1);
)
Also in my own language (swiss german) there's some debate of how to handle 0
counts and depending on the dialect, either new singular case (0) or many (1) applies.
Those rules & differences can be represented using the plural forms header.
By supplying the plural forms header to the callbacks, the implementation can make decisions as they were intended in gettext.
@whatyouhide Performance comparison (extended from https://github.com/elixir-gettext/expo_plural_forms/pull/33#issuecomment-1383212655) to illustrate what would happen if the header was parsed on every index call:
Mix.install([{:benchee, "~> 1.1"}, {:expo_plural_forms, path: "."}])
{:ok, arabic} = ExpoPluralForms.Known.plural_form("ar")
defmodule Arabic do
def index(n), do: unquote(ExpoPluralForms.compile_index(arabic))
end
Benchee.run(
%{
"precompiled" => fn ->
Arabic.index(17)
end,
"without" => fn ->
ExpoPluralForms.index(arabic, 17)
end,
"including parse" => fn ->
{:ok, arabic} = ExpoPluralForms.parse(" nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);")
ExpoPluralForms.index(arabic, 17)
end
},
time: 10,
memory_time: 2
)
Result:
Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 46.77 GB
Elixir 1.14.2
Erlang 25.2
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s
Benchmarking including parse ...
Benchmarking precompiled ...
Benchmarking without ...
Name ips average deviation median 99th %
precompiled 12.75 M 78.44 ns ±11373.62% 72 ns 100 ns
without 4.89 M 204.65 ns ±17541.84% 173 ns 348 ns
including parse 0.35 M 2833.62 ns ±846.85% 2546 ns 4091 ns
Comparison:
precompiled 12.75 M
without 4.89 M - 2.61x slower +126.21 ns
including parse 0.35 M - 36.12x slower +2755.17 ns
Memory usage statistics:
Name Memory usage
precompiled 0 B
without 0 B - 1.00x memory usage +0 B
including parse 9928 B - ∞ x memory usage +9928 B
**All measurements for memory usage were the same**
By supplying the plural forms header to the callbacks, the implementation can make decisions as they were intended in gettext.
I think this is approaching the problem based on the current solution but, given the limitations you added, I would say the problem needs to be reworked altogether. Likely we want to compile the rules encoded in the header as part of the backend compilation?
When you say the PluralForms header, is that a header for the .po file or specific to a given translation/msgid?
@josevalim The plural forms header is in the .po
header and not per message.
The idea with compile_plural
was to compile the rule once per backend and call it from there.
Based on the discussion over in ExpoPluralForms (performance), I think it would make more sense to go a plug like approach. There could be some kind of init
callback to do preparation works like header parsing and then a normal function call with the init result to calculate the index.
I'll put the PR back into draft state and implement a cleaner solution.
For example for chinese there's two different rules depending on the context. (nplurals=1; plural=0; & nplurals=2; plural=(n > 1);)
and how do you determine which one to use? :)
@josevalim The user of the library can decide which rule to use when he sets the header in the .po
file.
Many tools like Poedit help a user setting up the headers as well.
We have an example of that here in the source: https://github.com/elixir-gettext/gettext/blob/2ebe8066d6309ea2d810beb93e38433fef6f5071/test/fixtures/po_editors/poedit.po#L13
Yeah, I think the Plural-Form
header in each PO file is pretty neat. I think compiling the Plural-Forms
header once is a no-brainer, because we can parse it into an Elixir data structure really easily. I think "precompiled" vs "without" in your benchmarks show that "without" is probably a good compromise between simplicity and speed.
@josevalim / @whatyouhide I pushed a new version that is a lot simpler and does not cause any changes in the behaviour. Only a new callback is added.
Example implementation: https://github.com/elixir-gettext/expo_plural_forms/pull/36
EDIT: Moved to https://github.com/elixir-gettext/gettext/compare/jm/extend_plural_interface...jm/expo-example
@whatyouhide I've adressed all the comments except https://github.com/elixir-gettext/gettext/pull/343#discussion_r1081221654
Are we sure that we handle the deprecation question correctly? I have removed it but find it a bit strange to keep code (and not even discourage using it) when the behaviour it causes is potentially flawed.
@whatyouhide / @josevalim Quick note: I will have some surgery on my wrist next week. I will probably not have time to finish the Gettext.Plural implementation for expo until then. I however plan to pick it back up 2/3 weeks later when I can hopefully get rid of the cast and properly type again.
@maennchen no worries! I have some free time, so I can give it a shot if you want. I'm working on polishing docs as well. Thanks for all the work!! 💟
@whatyouhide If you’d like to, I don’t mind if you pick it up from here. But otherwise I’m also fine doing myself in a few weeks. However you prefer 😊 I might be able to do some code reviews, so just let me know if I can help.
Closes #318
This makes it possible to react precisely to the header instead of using predefined formats.