Translation Framework - Githubissues

KJ7LNW commented 1 year ago

[ moving translation discussion from #16 to here ]

@ndim,

Regarding gettextize and intltoolize: I have been involved with libgphoto2's build system for about the last 20 years, so I am kind of familiar with the gettextize based buildsystem aspects.

I just found this article, perhaps modern gettext would be better than libtoolize:

https://wiki.gnome.org/MigratingFromIntltoolToGettext

See commit 5c4c07cb for the minor intltoolize fixes I had to do at some point (for some reason I do not recall).

I could get that to work properly easily by basically mostly copying what libgphoto2 does, perhaps minus some special things I did for libgphoto2 specifically.

I think that would be great.

I just checked my branch for translations and I've not changed much in po/ that would matter in terms of merging. I'm new to translation considerations, so please add suggestions from your experience. Based on my current understanding these are the requirements that could be ideal for the po/ stack in xnec2c:

xnec2c.pot is generated automagically just prior to version release and committed for any textual changes. What is the best practice for maintaining the .pot files in git over time?
The .po files get automatically updated/merged with any .pot changes:
- New strings should be blank
- Missing/deleted strings: Do whatever is "normal" here, I'm not trying to reinvent the wheel, just curious about edge cases.
  - Should deleted strings in .po files be kept round in case the language comes back? I mean, someone put the work into translating...but that could end up being stale over time.
  - For example, what if a very long chunk of text was translated, and one or two words in the source material was changed? That could loose the translations that we might wish to keep.
Translation strings must support interaction with Glade XML from resources/xnec2c.glade so that GTK3 UI widgets use the translated text from the appropriate .po file.
Auto-generate .gmo files: At which stage should this happen?
- ./autogen.sh ?
- ./configure ?
Install the .gmo files in the right place
- Would be nice if the .gmo files are usable from the xnec2c source tree so ./src/xnec2c gets any new translations.
Support gettext from CentOS 7 (gettext version 0.19.8.1)
Other considerations?

KJ7LNW commented 1 year ago

I just pushed the intl branch that I've done so far, mostly just cleanup, _("foo") around things, and some textual changes for better clarity of meaning:

https://github.com/KJ7LNW/xnec2c/compare/master...intl

ndim commented 1 year ago

Regarding gettextize and intltoolize: I have been involved with libgphoto2's build system for about the last 20 years, so I am kind of familiar with the gettextize based buildsystem aspects.

I just found this article, perhaps modern gettext would be better than libtoolize:
* https://wiki.gnome.org/MigratingFromIntltoolToGettext

Interesting.

See commit 5c4c07c for the minor intltoolize fixes I had to do at some point (for some reason I do not recall).

I could get that to work properly easily by basically mostly copying what libgphoto2 does, perhaps minus some special things I did for libgphoto2 specifically.

I think that would be great.

I would try to proceed without intltool first, and do a proof of concept for translated C strings and translated XML stuff.

I just checked my branch for translations and I've not changed much in po/ that would matter in terms of merging. I'm new to translation considerations, so please add suggestions

Ah. Let me check that branch. The information from po/Makevars is actually answers what I wanted to ask you :-)

from your experience. Based on my current understanding these are the requirements that could be ideal for the po/ stack in xnec2c:
1. `xnec2c.pot` is generated automagically just prior to version release and committed for any textual changes. What is the best practice for maintaining the .pot files in git over time?

You do not have the .pot file in git, as that file is 100% generated from the files containing translations (*.c and possibly others). What you do have in git is po/*.po, which are the actual translations.

When release time approaches, you impose a string freeze, build a prerelease dist tarball containing updated po and pot file (make dist) and submit that tarball to the translators. Then the translators have some time to translate and send you back the updated translations. Then you commit+push those translations update and then you can cut the release.

How exactly the translators and the translation project manage partially translated po files and pot files I do not remember. https://translationproject.org/html/maintainers.html should contain the interesting information for you.

2. The `.po` files get automatically updated/merged with any `.pot` changes:

   * New strings should be blank
   * Missing/deleted strings: Do whatever is "normal" here, I'm not trying to reinvent the wheel, just curious about edge cases.

     * Should deleted strings in `.po` files be kept round in case the language comes back?  I mean, someone put the work into translating...but that could end up being stale over time.
     * For example, what if a very long chunk of text was translated, and one or two words in the source material was changed?  That could loose the translations that we might wish to keep.

You do not deal with that at all. The tools will generate *.pot and update *.po at the appropriate time. Whether it makes sense to have every make dist update the *.po files or whether you should explicitly make -C po update-po once for before committing the changed po/*.po to git and submitting the prerelease tarball to the translation project is a political decision (I would begin writing a release check list, e.g. in RELEASE.md or HACKING.md or similar).

What you might need to know about is if you change as much as a period or a single typo in a translated string, the string will be marked as fuzzy in po/*.po and not be used at all when the program is run in translated mode.

3. Translation strings must support interaction with Glade XML from `resources/xnec2c.glade` so that GTK3 UI widgets use the translated text from the appropriate `.po` file.

I will check that. We will need a test case for that.

4. Auto-generate `.gmo` files: At which stage should this happen?

   * ./autogen.sh ?
   * ./configure ?

Neither. This happens at adequate times when you have make build the respective targets.

5. Install the `.gmo` files in the right place

   * Would be nice if the `.gmo` files are usable from the xnec2c source tree so `./src/xnec2c` gets any new translations.

Hmm. That would be a bit of extra C code (possibly specifiy to the C library which actually implements the locale and translation handling), and possibly some extra build rules to arrange the translation files in directory structure the C library can use.

Usually the translations are someone made from po/de.po into po/de.gmo and finally installed as /usr/local/share/locale/de/LC_MESSAGES/xnec2c.mo aka $(localedir)/de/LC_MESSAGES/xnec2c.mo.

6. Support `gettext` from CentOS 7 (gettext version 0.19.8.1)

The article you linked to says gettext 0.19.7 or later, xnec2c has been requiring that for a long time, and CentOS 7 has gettext 0.19.8. So no issues there, we can continue to use the code from gettext-0.19.7 (autopoint copies the Makefile.in.in etc. to po/ from whatever gettext release AM_GNU_GETTEXT_VERSION([...]) mentions, as long as autopoint is from at least that gettext version)

7. Other considerations?

I would only register xnec2c at the translation project after the translation framework is in place. Then we can use a dummy proof of concept po/de.po translation for testing the translation framework and verifying that both C strings and glade strings are translated first, and later have the translation project take care of that translation.

I personally do not want to actually translate the software or use translated software. As long as the original language is English, I prefer to use the original language English over my native language German as it is usually shorter and more accurate. Also, if I want to communicate with upstream developers (a very important part of Open Source software), it is very helpful if I can tell them where the problem occurred in a language they actually understand, and that is usually English. And the issue only gets worse if you get into a different type of script like what e.g. Japanese uses. It is bad enough I cannot read, pronounce, or even just type the letters from two of our neighbouring countries' languages which use latin scripts (Polish and Czech). I really would not want to debug a problem with program output someone reported with the software running in Japanese.

So I would like to hand over the proof of concept po/de.po to the Translation Project or whoever else you want to hand the actual translations to.

KJ7LNW commented 1 year ago

Awesome, thanks for the many clarifications. Good idea about testing po/de.po first!

5. Install the `.gmo` files in the right place

   * Would be nice if the `.gmo` files are usable from the xnec2c source tree so `./src/xnec2c` gets any new translations.
Hmm. That would be a bit of extra C code (possibly specifiy to the C library which actually implements the locale and translation handling), and possibly some extra build rules to arrange the translation files in directory structure the C library can use.

Usually the translations are someone made from po/de.po into po/de.gmo and finally installed as /usr/local/share/locale/de/LC_MESSAGES/xnec2c.mo aka $(localedir)/de/LC_MESSAGES/xnec2c.mo.

Ok. There might be an LC_PATH var or something I can use at dev time, so its not an important goal unless it presents itself easily.

ndim commented 1 year ago

Heads up about your requirement to run translated without installing.

The gettext(3) man page says it looks for translated data in dirname/locale/category/domainname.mo. This consists of four parts:

dirname which we can set by calling bindtextdomain(3) (dirname is usually $(localedir) which is something like /usr/local/share/locale)
locale which is the locale, i.e. something like de_DE.UTF-8 or de_DE or de
category which is LC_MESSAGES for messages
domainname which in our case is xnec2c

Running make install in po/ (generated by the po/Makefile.in.in from gettext, placed there by autopoint) will install the appropriate *.mo files in the appropriate directory structure.

However, there is no such directory structure inside the build tree to which we could point bindtextdomain(3) for uninstalled operation.

While it might be possible to create such a directory structure inside the build tree without too much brittle make rule hacking, I will have to see that to believe it, and it might be ugly.

The workaround would be to configure --localedir=$PWD/locale && make && make -C po install once and then have working translations while running src/xnec2c from the build tree. However, if you update any translations, running make will not be enough to have them show up in xnec2c: You will have to run make -C po install as well.

On the other hand, you could also just install xnec2c to run it:

configure --prefix=$PWD/_i && make && make install && $PWD/_i/bin/xnec2c

The xnec2c sources are not many and are in C (not C++, which takes WAYYY longer to compile and link), so that whole thing goes relatively quickly even if you need to rebuild from scratch.

KJ7LNW commented 1 year ago

Thanks for the info, good ideas.

I'm good fine with all of that since po without install was only a nice to have if convenient, and by no means a requirement.

ndim commented 1 year ago

Just making sure, even though I am 99% certain of the answer: xnec2c is not "GNU xnec2c", is it? I am asking because of po/Makevars line PACKAGE_GNU = yes, which I think should be PACKAGE_GNU = no.

KJ7LNW commented 1 year ago

Good catch. To my knowledge, xnec2c is just "xnec2c" and has never been "GNU xnec2c"

ndim commented 1 year ago

https://wiki.gnome.org/MigratingFromIntltoolToGettext proposes a way to translate .desktop files which requires msgfmt to be present on all builds - whether those builds are from a git clone or from a dist tarball.

I am not sure whether we xnec2c should use that rule and require msgfmt for all builds, or whether it would be better to ship a translated copy of the .desktop file (in addition to the untranslated .desktop.in file in both the git source tree and in a dist tarball.

ndim commented 1 year ago

@KJ7LNW Is COPYRIGHT_HOLDER in po/Makevars correct?

 40 # This is the copyright holder that gets inserted into the header of the
 41 # $(DOMAIN).pot file.  Set this to the copyright holder of the surrounding
 42 # package.  (Note that the msgstr strings, extracted from the package's
 43 # sources, belong to the copyright holder of the package.)  Translators are
 44 # expected to transfer the copyright for their translations to this person
 45 # or entity, or to disclaim their copyright.  The empty string stands for
 46 # the public domain; in this case the translators are expected to disclaim
 47 # their copyright.
 48 COPYRIGHT_HOLDER = Neoklis Kyriazis nkcyham@yahoo.com

ndim commented 1 year ago

FWIW, I have a wip-intl work in progress branch (expect rebases and force pushes) for the translations.

It builds on your intl branch, but skips the intltool related things.

Also, it translates messages and UI and .desktop on Linux, but still fails to translate messages on macOS. Not tested on BSD yet.

KJ7LNW commented 1 year ago

I am not sure whether we xnec2c should use that rule and require msgfmt for all builds, or whether it would be better to ship a translated copy of the .desktop file (in addition to the untranslated .desktop.in file in both the git source tree and in a dist tarball.

If gettext is required at build time, and if it is easier to require msgfmt for all builds, then require msgfmt for all builds because msgfmt comes with gettext so its not like we are requiring a package they don't already have. OTOH if msgfmt isn't usually required on users' systems then creating .desktop during make dist is fine too. I'm open to your best judgement here.

Either way (if I understand correctly) we wont want xnec2c.desktop in git because the xnec2c.desktop.in file will be...right?

@KJ7LNW Is COPYRIGHT_HOLDER in po/Makevars correct?

Yep. Neoklis owns the copyright.

FWIW, I have a wip-intl work in progress branch (expect rebases and force pushes) for the translations.

ok.

ndim commented 1 year ago

Continuing the discussion from https://github.com/KJ7LNW/xnec2c/commit/114413b13a8ce2cc81f413c1d92ed40665f15c5b:

For example, I think some of those macros aren't even used (_N, etc).

N_ is an essential part of the message translation infrastructure:

const char *const world = N_("world");

int main(void)
{
   print(_("Hello, %s!\n"), dgettext(world));
   return 0;
}

Anyway, i18n.h contains a minimum API from which should be available.

Unless you can think of a reason not to, i18n support should just be required for builds which would cleanup all these #ifdef's. I would be surprised if anyone was building on a system so old that they couldn't support i18n and even if they are, maybe they should upgrade. Ultimately CentOS 7 is the oldest version I want to support at this point, and it does translations just fine so far as I can tell.

It is a good idea to support both --disable-nls and --enable-nls builds. The xnec2c translation framework might not work on a system, or people do not have the translation libraries, or they do not care about the code size, execution time, and storage size overhead.

ndim commented 1 year ago

I am not sure whether we xnec2c should use that rule and require msgfmt for all builds, or whether it would be better to ship a translated copy of the .desktop file (in addition to the untranslated .desktop.in file in both the git source tree and in a dist tarball.

If gettext is required at build time, and if it is easier to require msgfmt for all builds, then require msgfmt for all builds because msgfmt comes with gettext so its not like we are requiring a package they don't already have. OTOH if msgfmt isn't usually required on users' systems then creating .desktop during make dist is fine too. I'm open to your best judgement here.

Either way (if I understand correctly) we wont want xnec2c.desktop in git because the xnec2c.desktop.in file will be...right?

As neither xnec2c.desktop.in nor the /.po file will change often, I would argue it would be OK to have xnec2c.desktop in git as well.

However, the only time when msgfmt is not available at configure time is when people are building without NLS support anyway. And for that case we can just copy xnec2c.desktop.in to xnec2c.desktop without having msgfmt add translations: Installing an untranslated xnec2c.desktop file for an untranslated xnec2c build will not make the user experience significantly worse.

KJ7LNW commented 1 year ago

True, that makes sense to me.

ndim commented 1 year ago

BTW, regarding the proof of concept test translation into German: While I have studied Electrical Engineering and I still remember the basics of field theory, I have never dug into RF and antennas nor into amateur radio, so I am not very familiar with the German terminology regarding antennas and RF. Therefore I only plan to translate a few strings for testing as a few different types of translation cases as are needed for testing.

KJ7LNW commented 1 year ago

Sounds good, if its works then that is a fine starting point.

KJ7LNW / xnec2c

Translation Framework #20