WeblateOrg / weblate

Web based localization tool with tight version control integration.
https://weblate.org/
GNU General Public License v3.0
4.52k stars 998 forks source link

Translation-time context management #1507

Closed burner1024 closed 3 years ago

burner1024 commented 7 years ago

Edit: see comment for clarification.

Hi. Is it possible to add a option for translators to manage msgctxts, including adding new ones?

I'm not sure if that's something needed or even viable, so let me describe the situation first:

So... that looks like a long shot, but I was thinking of a feature that would allow the translator to "split" the source string into several contexted ones and translate them individually? That will require adding newly created entries into po and pot, though.

Or maybe I'm missing an obvious way to handle such a situation? Please advise.

nijel commented 7 years ago

AFAIK this will break later with gettext - eg. msgmerge will remove contexts not present in pot. Also how would application choose which string to use when translators can add own contexts? I think this is really best to handle already in the code and have gender specific messages there...

burner1024 commented 7 years ago

That's why I mentioned that that will require fixing po/pot simultaneously.

In this particular use case, the translation will result in two separate packages (and it's not PO files, it's getting converted to and from PO just for the purposes of translation).

Handling it in the code requires going through all the entries and deciding which ones should be contexted and which shouldn't, beforehand. Which programmers can't do. And having translators mess with code is exactly what Weblate is built to avoid, isn't it?

nijel commented 7 years ago

This really sounds quite specific use case. I'm not sure if having this in Weblate is useful as in most cases people don't want to edit context.

When doing this myself (with intention to provide 2 separate packages), I'd probably just define separate locales for each (eg. es@male and es@female).

burner1024 commented 7 years ago

Doesn't seem so specific to me. Any translation from a gender-neutral to a non gender-neutral language must face it. Maybe people don't want to edit context because they never had that option? Anyway, I see that this idea doesn't seem particularly attractive to you, so I'm closing the issue. Thank you for your answer.

nijel commented 7 years ago

The problem is that if you use gettext, it makes no sense to edit context for translator - it's used as identifier in the code, so if you add context, you have to change the code as well to actually use it.

burner1024 commented 7 years ago

If gettext is used directly, yes. But if it's just an intermediate format, that is not the case. And seeing the number of "po2xx" convertors, I believe it's not the case often enough.

nijel commented 7 years ago

BTW: What file format do you actually use and convert to po?

burner1024 commented 7 years ago

In this particular use case, it's MSG.

nijel commented 7 years ago

So you still would end up generating separate translation files for each gender?

burner1024 commented 7 years ago

Correct.

burner1024 commented 7 years ago

Coming to think of it again, it's probably not a very good idea to allow translators to add arbitrary contexts. Might lead to confusion.

Instead, if a set (or sets) of possible contexts would be predefined by the project admin, and translators then could to either "translate pristine", or "translate with context", that would allow for flexible translation while keeping it formalized.

nijel commented 7 years ago

That's what I meant in https://github.com/WeblateOrg/weblate/issues/1507#issuecomment-306259507.

Having this configurable is probably option, but still quite big feature which IMHO will not find much users...

burner1024 commented 7 years ago

Yes, I agree that it may not find many users. Would it be that Weblate had pluggable architecture, it might have been easier to add... When I have time, I'll try to see for myself how if that's possible, but I'm not sure if my skills are good enough yet.

wichert commented 7 years ago

For what it's worth ICU MessageFormat supports this nicely. Here is an example message:

{gender, select,
    male {He}
    female {She}
    other {They}
} will respond shortly.

I agree with @nijel that it makes little sense to support context creation in gettext formats; all gettext tools will just discard any newly added contexts.

burner1024 commented 7 years ago

But Weblate doesn't support that fancy format, does it? So it's irrelevant. I'm not sure what do you mean by discarding context. I don't think gettext will discard it.

ALIENQuake commented 6 years ago

@nijel I disagree that such function won't find may user cases. What's more important it's the the lack of such gender specific distinction is a blocker for translation of more than 1000 Infinity Engine mods via weblate. But it depends how you define 'many'.

What sort of support could convince you to implement such feature in the next 3 months?

burner1024 commented 6 years ago

After reflecting on this for some time, I've come to conclusion that 1) The original description is overly generic. Really, it's about allowing gender-specific translations, not arbitrary contexts. 2) There's no good way to implement this until Weblate only supports translate-toolkit formats, none of which allow gender distinction. The reason is, there's no place to store female-specific strings: POT is generated automatically, and POs derive from POT.

The best thing Weblate could do (maybe with new plugin system?) is to allow to easily hook into PO file save function, and web ui translate form. Then the plugin, or hook, could save the extra strings in an extra file (say, french.po_female). Then strings from that file will be automatically picked up by po2xx converter. I'll try to implement something like this, and report the results.

nijel commented 6 years ago

Sorry for not following up for some times. Yes, complex formats like ICU message or L20n do not have any special support in Weblate. On the other side, it has no problem in showing such strings to edit by translators, so that should work without problem (but without at least syntax checker, translators will produce many non working expressions).

Generally I don't like these as they turn translations into programming language. This is something what translators usually do not handle well. The ICU one seems at least a bit limited in the expressions, but L20n has IMHO gone too far (see their complex example).

I'm planning to have several kinds of addons possible for next release, so if you're able to agree on way handling of this, it might be good way to implement this.

@ALIENQuake How do you currently store these translations?

Also if somebody wants to financially motivate me (or somebody else) to solve this issue, you can use Bountysource. It used to be integrated in GitHub, but it's broken for several months (see https://github.com/bountysource/core/issues/1096).

burner1024 commented 6 years ago

Here's a test implementation.

For the reference, Infinity Engine mods translations are stored in TRA files (example). In our system, they are converted to and from PO by hooks, using helper tools.

nijel commented 3 years ago

New strings in bilingual formats can be added starting with the 4.5 release: https://docs.weblate.org/en/latest/admin/projects.html#manage-strings

burner1024 commented 3 years ago

Sorry, I don't see how that is supposed to help. (But anyway, I've pretty much given up hope on getting it implemented, and added a hack, so it's just as well to me.)

nijel commented 3 years ago

You can now add variants of a string in Weblate, including custom context. I thought that would work here as well.

ALIENQuake commented 3 years ago

@nijel Hi, Can you be more specific? Can we separate the male and female versions of strings? How does it look at GUI? How the alternative variant is stored?

nijel commented 3 years ago

There is no specific feature for male/female version of the strings. Weblate 4.5 comes with features that you can probably utilize to achieve this though:

It is a generic solution aimed at other use cases as well.

ALIENQuake commented 3 years ago

@nijel Thank you for your work! Donation sent.

burner1024 commented 3 years ago

OK, not to be a downer, but just fyi - I revisited the docs, and I think that unfortunately it still does not cover the initial case, which is:

  1. Strings are not managed in weblate.
  2. Need to have an option to provide and somehow store 2 different translations for the same string.
nijel commented 3 years ago

Sorry, but I don't see a solution for that besides managing strings in Weblate or using rich localization formats as Fluent which allow you to define arbitrary conditions inside the translation.

burner1024 commented 3 years ago

@nijel when using variants, though, where do they go in the PO file? Does Weblate "Context" from Tools menu get saved in msgctxt? Is it possible to link them to the original string by looking at the resulting PO file? (For a given string in PO, find its variants in the PO). And in Weblate, do they get associated to the same source file/string as the original string (source context file:string)?

nijel commented 3 years ago

The variants are for grouping existing strings, see https://docs.weblate.org/en/latest/devel/translations.html#variants

burner1024 commented 3 years ago

I mean

The additional variant for a string can also be added using the Tools while translating (when Manage strings is turned on):

nijel commented 3 years ago

Yes, that adds a string with given msgctxt.

burner1024 commented 3 years ago

And do they get associated to the same source context file:string? ("Occurence" in PO terms) If not, what do they get for occurence?

nijel commented 3 years ago

The strings will be associated via variant:ORIGINAL STRING flag, see https://docs.weblate.org/en/latest/devel/translations.html#manual-variants

burner1024 commented 3 years ago

Yes, I got that, but what does it mean in PO terms? I mean, suppose that original PO string is this:

#: ascension.tra:1014
msgid "Focus"
msgstr "Concentración"

When a manual variant is added, how the resulting PO will look like?

nijel commented 3 years ago

The variant information is not stored in the PO file.

burner1024 commented 3 years ago

But you said that it adds a string with the given Context/msgctxt.

nijel commented 3 years ago

Yes, whatever you enter in "Context" will end up in msgctxt:

image

But this has nothing to do with linking these two strings together, that is currently done purely via flags in Weblate.

burner1024 commented 3 years ago

Sorry, I'm trying to make myself clear, but apparently not succeding. So let's say I have the original string translated in PO:

# original string
#: ascension.tra:1014
msgid "Focus"
msgstr "Concentración"

Its PO occurence is ascension.tra:1014.

Now I go and add a manual variant in Weblate: Captura de pantalla de 2021-06-01 19-36-33

The question is, after the variant is added, how does it look in the PO?

This is what I expect, but some things are not clear, see the comments:

# original string
#: ascension.tra:1014
msgid "Focus"
msgstr "Concentración"

# ADDED VARIANT. IS THIS HOW IT WILL LOOK?
#: WHAT IS HERE? (occurence)
msgid "Focus"
msgstr "Concentración-alt"
msgctxt "alt-context"
nijel commented 3 years ago

It will have no additional information:

# original string
#: ascension.tra:1014
msgid "Focus"
msgstr "Concentración"

msgctxt "alt-context"
msgid "Focus"
msgstr "Concentración-alt"
burner1024 commented 3 years ago

OK, I got it, thank you for explanation. I might be able to work with that.

burner1024 commented 3 years ago

@nijel one more question, if I may. Supposing that I have strings and their variants already in PO as described above, but variants are not marked as such, how can I bulk mark the variants so that Weblate knows that they are fact variants?

I assume some kind of script as for string1 in (select * from strings); if count(select string2 from strings where string1.msgid == string2.msgid)>1; then insert into variants (string.id, string2.id); done. Should I use SQL? API? CLI? Obviously not asking for a complete solution, just some pointers. Can't find docs on SQL schema.

nijel commented 3 years ago

Using API should work for this:

  1. List all strings using https://docs.weblate.org/en/latest/api.html#get--api-translations-(string-project)-(string-component)-(string-language)-units-
  2. Add flag variant:"string" to all matching units using https://docs.weblate.org/en/latest/api.html#patch--api-units-(int-id)-

The SQL schema is generated by Django ORM, there is currently no documentation on that (and there probably won't ever be any as it is not recommended to use SQL directly as there might be logic constrains not being applied at SQL level).