flarum / framework

Simple forum software for building great communities.
http://flarum.org/
6.37k stars 835 forks source link

i18n: Fix handling of articles in validation.yml #609

Closed dcsjapan closed 8 years ago

dcsjapan commented 9 years ago

The fact that every string in this file begins with The :attribute poses a problem for some translators (we've heard from the French and Dutch translators so far).

In many languages the definite article "the" must be rendered in different ways to agree with the noun for gender. The obvious solution is to include the article in the attribute sub-strings; and in most cases this would work. But in a few cases, the attribute sub-strings won't necessarily appear at the start of the translated sentence, so the first letter of the sub-strings needs to be capitalized in some instances and not capitalized in others. And in other cases, the definite article is replaced by a possessive, making it impossible to put the article in the sub-string.

For example, note the difference between this pair of strings in English...

  numeric: "The :attribute must be a number."
  regex: "The :attribute format is invalid."

... and in French:

  numeric: "Le :attribute doit être un nombre."
  regex: "Le format de :attribute est invalide."

The problem here is that :attribute is being used as a modifier rather than the subject of the sentence.

Maël has been able to work around this by using only masculine nouns in the sub-strings, but it is not an optimum solution for French, and we should not count on translators being able to find such solutions in other languages.

tobyzerner commented 9 years ago

This is out of our direct control since Laravel handles the validation message translation. Doesn't look like it's been reported over there either.

If we can come up with a good solution we could send that upstream (PR to laravel/framework) and hope they'll accept it?

dcsjapan commented 9 years ago

It's odd that the subject hasn't come up yet. I'm all for feeding it back to them, if we can come up with something that works. Let me think about it for a bit and see if I can't come up with something.

dcsjapan commented 9 years ago

Okay. My original explanation of the problem was a bit muddled because I had just gotten up from a nap and was a bit woozy. Here's a better breakdown of the situation. We've got two issues happening here:

Replacement of the article by a determiner

When the :attribute is being used as a modifier, or the first half of a compound noun, it can end up being translated as a noun that is possessed by another noun, which is the true subject. When that happens it takes a possessive determiner (such as the French de) instead of an article (such as le).

At present, this only happens in a few cases. One refers to the "format" of the :attribute; the others all refer to the "field" of the :attribute. The easiest way to get rid of this issue would be to rewrite these sentences so we're not talking about the "format" or the "field", but about the :attribute. This should not be too hard to accomplish.

Capitalization of the article

The other problem occurs when the position of the ":attribute" in the sentence is not definite. Since translators need to stick the article in the :attribute string, we end up with a situation where we're using a capitalized article in the middle of a sentence, or a lower-case article at the beginning.

The easiest way around this problem would be to make sure that :attribute strings never appear at the start of a sentence. The article can then be left uncapitalized. This is of course something that translators can do for themselves, but it might be a good idea to rewrite the English sentences along these lines, to point translators in the right direction as it were.

One approach we could take is to make the user the subject of the messages. For example, instead of

  url: "The :attribute is not formatted correctly."

we could say

  url: "You need to format the :attribute correctly."

This would be a more user-centric style that would be appropriate for messages meant to be read by forum users. I'm not sure whether it would be appropriate for these validation messages, however.

ghost commented 8 years ago

In French, the issue is the determiner (more specifically the definite article) on strings using email, because it should be localized adresse de courriel (email adress), and it's feminine. So, the feminine form of the is la in French, and because adresse is a word beginning with a vowel, la adresse must be contracted l'adresse. It's called elision, see these articles on Wikipedia:

I temporary fixed it by localizing email to courriel, which is masculine and use le (masculine of the) like other articles, but it's not a perfect localization.

I hope it's understandable. :)

dcsjapan commented 8 years ago

I'd like to discuss possible solutions in more abstract terms, because other localizers may run into similar issues, and if we're going to take measures to improve the localizability of validation.yml, we should try to make sure it works for as many languages as possible. So ... just thinking out loud here ...

To recap:

We've got two issues to deal with. First, there's the fact that the article preceding the variable noun may take a different form, or even be replaced by a determiner, depending on factors such as the following:

Plurality would also pose an issue, if any of the possible values for :attribute were plural. Now, if we only had the first two factors to contend with, the solution would be simple: just stick the article in the variable! The fact that the article may be replaced by a determiner makes this impossible.

Second, there's the fact that the article may need to be capitalized in some instances and not in others. This is another strike against the "stick the article in the variable" approach. At first glance you might think that approach would work, since the writers of validation.yml have taken care to put "The :attribute" at the start of every sentence. But the variable noun can also appear in the middle of the sentence when

... so if we stick the article in the variable, we'd need some way to condition its capitalization.

A possible solution:

I suggested in latest comment that it should be possible to work around both of these issues by simply rewriting the messages. We just have to make sure that the rewritten messages meet two criteria:

As an example of how this can be done, let's try rewriting a message that's affected by both issues:

  required: "The :attribute field is required."

First, the variable is being used to modify the noun "field"; we can fix that by getting rid of that noun. This doesn't really change the meaning of the message, because it's not the field that's required: we are really requiring that the user enter something in that field. Making this meaning more explicit gives us a way to move the variable out of the initial position:

  required: "You need to enter something for the :attribute."

This translation meets both our criteria. But since we're doing all this to make it possible for localizers to include the article in the variable, we can drop a hint by doing the same in English. We should probably remove the article from the above sentence ...

  required: "You need to enter something for :attribute."

... and add it to every possible value of :attribute, like so:

  attributes:
    username: the username
    password: the password
    email: the email address

You'll note that I'm also changing the English translation of email: to account for the fact that this term will never be used to modify the noun "field". We have to make sure all the translations under :attribute refer to the values that go in the fields, rather than the fields themselves.

Please also keep in mind that, since the writers of validation.yml have begun every message in the file with "The :attribute", this solution will involve rewriting every message in the file. This isn't necessarily a bad thing, since in effect we'll end up rewriting all the messages in a more user-centric style that will mesh better with how language is employed in the Flarum user interface.

Other options?

An alternative solution might involve modifying the code used to output these messages.

For example, it may be possible to check whether one of the values under attributes: is appearing at the beginning of a sentence or not, and capitalize/lowercase it appropriately. But doing this would mean added complexity to deal with something that may not be applicable, or even necessary, in many languages. And we'd still have to rewrite at least some of the messages, and possibly all of them (if we want to move the articles into the variables as described above).

Changing the code to handle cases where the variable is used as a modifier would be quite a bit more complex, and would tend to be even more language-specific. A can of worms best left closed, perhaps.

How far do we take this?

Let's assume that we don't want to change the code at all, since that would be too much trouble. Do we really want to rewrite all the messages? Or should we just leave it up to each localizer to find a wording that works for his/her language?

This is a good question. There's no law saying these messages have to be translated literally. We could just leave the English messages as they are, and leave it to the localizer to find a more liberal translation that works well in his/her language.

... And in fact, some of the translations available here are already pretty liberal. For example, I see that the Japanese translation for the required: phrase (used in the example above) reads something like this:

  required: "Please be sure to specify :attribute."

So if we rewrite these messages, we'll be doing so primarily as a convenience to localizers ... to guide them toward a phrasing that will avoid the problems caused by these article-related issues. (Though we should not completely discount the effect on the English messages: if we think more user-centric error messages would be a better fit for Flarum, then we may want to do it for that reason alone.)

Feedback to Laravel?

If our solution involved modifying the code, then the question of providing that solution as feedback to Laravel would be a no-brainer. Since we're only talking about rewriting the messages, I guess it really comes down to this: Do we think these changes would benefit Laravel users as a whole?

I'll leave that one up to you folks, since you know that community better. :wink:

franzliedke commented 8 years ago

(Though we should not completely discount the effect on the English messages: if we think more user-centric error messages would be a better fit for Flarum, then we may want to do it for that reason alone.)

That's all I care about, really. If you think that the new version would sound better to users (and it sounds like you do), then by all means let's do this. If it also serves as a hint to translators: even better (even though it seems like they're already pretty good in finding a way around this).

And if it works, we can, of course, submit this back to Laravel. Just note that Laravel chooses this route primarily because they automatically convert the field name (e.g. "email_address") to a capitalized "Email Address" for the :attribute placeholder, if I remember correctly. That would not work well with a solution that would require an article as part of :attribute (even though they allow custom translations for all of these, too).

luceos commented 8 years ago

@franzliedke you are correct that the attribute name is automatically changed, but you can also manually declare them in Laravel, which of course would not be something the Laravel team would want to do for every single possible field name.

I think we should modify this behavior within the Flarum project, but we can notify @taylorotwell about this discussion, I'm sure he has taken the topic of localization seriously and has given it the needed attention. Perhaps seeing the expert explanations by @dcsjapan will give him too another perspective on this matter, you never know whether this will have an effect on any future major release of Laravel.

dcsjapan commented 8 years ago

If you think that the new version would sound better to users (and it sounds like you do), then by all means let's do this.

Well, the standard messages are serviceable, so I didn't see improving the English as a priority. (If it was, Toby probably would have improved them by now.) But there is a pretty big gap between the style used for those messages and how Flarum uses language.

And ... generally speaking I prefer to see error messages with a more user friendly style. Impersonally phrased error messages always strike me as adding insult to injury, if that makes sense. :stuck_out_tongue:

Just note that Laravel chooses this route primarily because they automatically convert the field name (e.g. "email_address") to a capitalized "Email Address" for the :attribute placeholder, if I remember correctly.

That's interesting ... and slightly puzzling. Why not have them properly capitalized in the YAML file, and save on the processing? Or are they only capitalized in some instances and not in others?

In any event, if we go with the approach I'm suggesting, then the capitalization won't be necessary, since we'll be talking about the data that goes in the fields, not the fields themselves. That is,

You need to enter something in the Email Address field.

looks fine with the capitalization, but it looks out of place in something like

Please enter your Email Address.

... so we'd want to turn the automatic capitalization off, to avoid that sort of thing.

but we can notify @taylorotwell about this discussion

Please feel free to do so ... his input and opinions would be most welcome.

tobyzerner commented 8 years ago

Just been looking at the Laravel code, and attribute names are not automatically capitalised. That is, "email_address" becomes "email address", not "Email Address". Not sure if that changes anything?

I think I'm in favour of just leaving the English translations as-is :)

ghost commented 8 years ago

... And that's better like that, because the English language is an exception concerning names capitalization, and it could be an issue for other languages to have an automatic capitalization.

+1 to leave the "default" translations. It will be easier to maintain when new variables are added, and that's not a major language issue.

dcsjapan commented 8 years ago

Just been looking at the Laravel code, and attribute names are not automatically capitalised. That is, "email_address" becomes "email address", not "Email Address". Not sure if that changes anything?

Thanks for checking, Toby, that does get rid of one concern. If you're okay with fact that the style of language in the error messages doesn't quite match the style used in the Flarum UI ... and that's not a huge concern, because we don't really see these messages all that often ... then I agree, it's best to leave the English phrases as they are.

As for the issues with articles and determiners and such, we can probably leave it up to localizers to find a workaround. Most of the time, a more liberal approach (e.g. "Please enter your email address" instead of "You need to enter something in the email address field") will provide a solution. If a localizer comes up with an issue that can't be resolved by rewriting the message, we can revisit this issue then.

Does that seem doable to you, @maelsoucaze ?

franzliedke commented 8 years ago

In any case, we can always create specific messages for each and every field and validation type.