elixir-gettext / gettext

Internationalization and localization support for Elixir.
https://hexdocs.pm/gettext
469 stars 87 forks source link

Support `.mo` files for compilation #317

Closed maennchen closed 1 year ago

maennchen commented 2 years ago

Expo supports parsing / writing .mo files, which are a lot faster to read since it is a simple binary format.

I would like to support it here as well.

Proposed changes:

whatyouhide commented 2 years ago

@maennchen how do other Gettext implementations (for other languages) tackle MO files?

maennchen commented 2 years ago

@whatyouhide Most (C based) implementations I‘ve used so far only read the .mo for runtime translations. The .po / .pot is only used for extraction / to help with merge problems.

whatyouhide commented 1 year ago

@maennchen can we close this now that Expo supports MO files?

maennchen commented 1 year ago

@whatyouhide That was the next issue i wanted to tackle:

Support .mo in Gettext itself. I think (we have to benchmark it) it makes compilation faster.

whatyouhide commented 1 year ago

@maennchen got it, makes sense. This would require a slightly different workflow for Gettext entirely, right? We'd have to dump POs and MOs, and read MOs if present, falling back to PO? Do you have an exact workflow in mind? I ask because I have some cycles I can dedicate to Gettext 😉

maennchen commented 1 year ago

@whatyouhide I wanted to make the file handling strategy configurable (at least at the start to prevent breaking changes)

whatyouhide commented 1 year ago

Do you envision MOs being committed in version control? Is this the flow used by GNU Gettext, if you know?

maennchen commented 1 year ago

@whatyouhide I intend to commit them.

Gettext itself has no opinion about mo files in vCS as far as I‘m aware of.

I know from the PHP ecosystem that in most cases mo files are committed. I also have experienced opinions that those should not be committed and is only added on demand / for releases.

Speaking for myself: I would commit them and would not be concerned about conflicts in .mo files since you can always regenerate them from merged .po files.

Because there seem to be different opinions about this, I wanted to implement it as a configurable strategy so that people can decide how they want to handle it.

whatyouhide commented 1 year ago

(I closed this by accident, sorry about that!)

My guess would be that these files should not be committed, as essentially they're a duplicated "cache" of PO files anyways. I’m ok with configuration, but I'd like to keep simplicity as much as possible. For example, before diving into this, I'd ask: does Gettext compilation take significant time today? Are we sure introducing MO files, which increases complexity, is worth it?

maennchen commented 1 year ago

@whatyouhide

Performance Impact

In a bigger application like https://github.com/jshmrtn/hygeia, the parsing of the .po file takes around 0.2s per language on my machine. If the performance comparison of https://github.com/elixir-gettext/expo/issues/21 is still more or less accurate, potentially around 75% of the time could be saved. (~ 0.8s)

I think the generating of the functions inside the backend takes longer though compared to the actual parsing. So maybe having a look at that performance would make a bigger difference.

An even bigger impact is the compile time dependency of all the modules using the gettext backend. Changing one translation currently means recompiling most of the applications it is used in.

Committing

I think committing .mo files is ok. Most people are also committing .pot files even though they're technically just cached extractions. Depending on the project, the line of how much we want to "cache" can be different. In bigger applications, I might want to make the trade-off and in a quick demo project not.

I also don't seem to be alone with this opinion. There are currently over 132 million checked-in .mo files on GitHub: https://github.com/search?l=&q=extension%3Amo&type=code

josevalim commented 1 year ago

An even bigger impact is the compile time dependency of all the modules using the gettext backend. Changing one translation currently means recompiling most of the applications it is used in.

Yeah, especially because changing a translation does not change the code generated at compile-time.

whatyouhide commented 1 year ago

Considering the added complexity of supporting MO files, I'd definitely shift our focus on the compile-time dependencies and function generation, yeah.

maennchen commented 1 year ago

Discussion moved to #330

whatyouhide commented 1 year ago

Great, thanks @maennchen. I will close this for now then, and we can reopen in case this comes up again. Thanks! 💟