Open mpvl opened 9 years ago
Here is one solution: https://github.com/maximilien/i18n4go
@maximilien: i18n4go does not address localized formatting of values like numbers and I think it will be difficult to retrofit it properly. In case of selecting translation variants based on the linguistic features of the arguments, you'll end up with the same struggle one witnesses with localization frameworks for other languages. Also, i18n4go extracts all strings and then uses an exclusion file. This may work well for command line tools or applications where most strings need localization, but this is not the norm. It breaks down when a large number of the strings in code do not need to be localized. For example, internal error messages are often not localized and may actually be the bulk of the text. Addressing both issues will likely result in a different API, for example like the one proposed. The implementation of the proposed API is more complex, but it eliminates the need to generate a parallel version of the code and T wrappers.
This proposal is fairly agnostic about translation pipelines, though. So it may be possible to fit this proposal on top of the i18n4go translation pipeline. Seems like a convenient first target.
Using the Printf of message.Printer has the following consequences:
- ...
- the format string is now a key used for looking up translations
Is the format string by itself sufficient for determining the context? I can imagine a very simple Printf used like m.Printf("%s: %d", m, i)
where the format string %s: %d
could appear a dozen times throughout a codebase with very different contexts. (You could argue that this is a very poor format string to begin with, but it still demonstrates my concern.)
I must admit I'm not very familiar with localization problems and this may not be an issue in practice.
@infogulch It is indeed not enough. In my provisionally worked out API I do define a Key function that can be used for things like adding meaning and alternatives. I left it out of the design doc to not go into details too much. (I also stripped about 1/3rd of my original draft; maybe I went a bit overboard.)
Note that as the string has no meaning in itself, you could always write the format string as, for example, "Archive (verb)"
and "Archive (noun)"
and supply a "translation" for these in English ("Archive"
for both). But this does not address all concerns. A more general solution:
Printf would have the following signature:
func (p *Printer) Printf(key Reference, args ...interface{}) (n int, err error) {
where Reference is either a string or a result from a func like
func Key(id string, fallback ...string) Reference {
This allows the familiar Printf()
usage while addressing the concerns you raised. Many localization frameworks have a solution of a similar nature.
But the example string you provide does raise another good point: there may be format strings one does not want to translate at all while still using the message package to substitute localized values. This is possible as is (e.g. fmt.Printf("%s: %d", m.Print(m), m.Print(i)))
, but may be a bit clunky. A bit better may be something like m.Printf(message.Raw("%s: %d"), m, i)
, where the use of Raw makes extraction skip the string. I don't think there are too many cases where this is used, though. Even "%s: %d"
will vary per language. But single-value substitutions like "%2.3d"
should probably be excluded from translation.
As far as plurals are concerned, I've seen some elaborate examples, but what somewhat skimming the doc, it seems they can use only "<", ">", and "=" operators; I didn't read it 100% thoroughly however, so I may be wrong. I'll thus let myself ask here for clarifications: are the proposed mechanisms enough to cater for the rule for e.g. Polish language? In a version I found on Weblate site, it's described as [1] [2]:
n==1 ? 0 : // "single"
n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : // "few"
2 // "many"
which seems to me quite fine, correctly giving e.g.:
1 orangutan
2-4 orangutany
5-9 orangutanów
10, 11, 12, ..., 21 orangutanów
22 orangutany
101 orangutanów
102 orangutany
etc.
@akavel: one should distinguish selectors from the rules you mention. The rules you refer to (which are defined in CLDR) would be used by the plural package to map numbers to a small set of plural categories (in the case of your example: single, few and many). The selectors subsequently pick alternatives based on these simplified categories. The maximum number of such categories, IIRC, is 6 (e.g. for Arabic). Most localization frameworks that support plural, allow selecting on these categories only. ICU adds selecting on the number value (using "="). The matching algorithm defined in this proposal is a bit different from ICU, allowing also for Vars and selecting on "<" and ">". The selectors will often be generated or written by translators (with the help of a GUI) so they should remain simple.
In my foreseen implementation, it is really up to the feature implementation to interpret selectors. This means that there is a lot of flexibility in supporting wild feature value matching. However, if one looks at linguistic grammars like LFG and HSPG, which use many more features, the set of possible feature values is usually small.
The doc is indeed a bit sparse here (as well as all other topics, really).
@mpvl, sounds good. Happy to try and integrate once you have something ready to try. Best.
Have you guys seen this one? https://github.com/nicksnyder/go-i18n seems pretty solid at first sight.
It uses JSON as its base format, has tooling to help with the translation workflows
By the way, I submitted some formatting fixes for the proposal doc a few weeks ago.
https://go-review.googlesource.com/19753
Not sure what I was supposed to do to get it reviewed.
Any updates on how far the proposal is implemented in x/text/language, I find it a bit hard to figure out if this is anywhere near production readiness.
x/text/language is definitely production ready. But if you mean the specific functionality of this issue, it is still under development. Lately the focus had been more on other parts, my intention for the upcoming months to specifically focus on segmentation and this.
That said, string substitution is available with limited functionality, so you could play around with it. I recently checked in a tool to extract strings.
Thanks for your reply, I have so far only used x/text/language in production and coded something around it that translates and formats messages for different countries. Just wanted to check if the language API is still up for changes.
No plans to change. Works well enough to the point it is not useful breaking people.
Hi , Which package handles localized formatting/display of dates/times - or is this functionality not yet complete?
Hi @mpvl, others, I'm using x/text/collate to test the sorting of some random strings. Below I use a Korean collator.
import (
"fmt"
"golang.org/x/text/collate"
"golang.org/x/text/language"
)
func main() {
strs := []string{"boef", "音声認識", "音声認識1", "aaland", "amsterdam", "월요일", "日付と時刻"}
cl := collate.New(language.Korean) //Korean collator
cl.SortStrings(strs)
fmt.Println(strs)
}
Output: [aaland amsterdam boef 월요일 音声認識 音声認識1 日付と時刻]
If I use ICU to sort these strings (using level 3 strength), then I get the strings back like this:
[월요일 音声認識 音声認識1 日付と時刻 aaland amsterdam boef]
Am I setting up the collator incorrectly? I'm using v1.8beta.
Hello @MickMonaghan,
look slike there is not so much interest in this discussion, I just add my findings so far.
I looked into the collate code and could not really figure how the sorting is made up. There are some byte blocks that are loaded by offsets, no idea how they work. I had also not so much time to figure that. So if someone likes to explain how that actually works I would be grateful.
I asked a Japanese friend of mine how he would sort a list of German and Japanese cities. This is what he came up with.
So he either converts the Japanese into Latin or the Latin into Japanese alphabet and sorts it then. I think that is also a good way to sort this list, first translate the syllables into the other alphabet and then sort it correspondingly.
Hey @morriswinkler-simplesurance - thanks for the response. I'm not entirely concerned with how it works, more concerned with does it work. In some situations the collator clearly does work:
strs := []string{"champion", "humble"}
cl := collate.New(language.Slovak)
cl.SortStrings(strs)
//this correctly sorts 'champion' *after* 'humble' - as expected in Slovak
With a Korean sort, the Latin characters should be sorted after the Korean characters. But that's not happening.
@MickMonaghan: the implementation is based on the CLDR UCA tables. If I look at the collation elements of both the DUCET (Unicode's tables) and CLDR (the tailorings) they both show Hangul to have a higher primary collation value then Latin. So that explains why Korean is sorted later.
What is probably happening in ICU is that the the script for the selected language is sorted before other scripts. The Go implementation currently does not support script reordering, though. This is an TODO, but depends on changing the implementation to using fractional weights. This is a huge change and may take a while.
@MickMonaghan: I suggest you file a separate issue for this so it can be tracked individually.
@MickMonaghan: dates/times is on the list, but only after number etc. is completed.
Thanks @mpvl , I'll log the collation bug
I started trying out golang seriously today to create a small application just for fun. However when I tried to localize my little application I didn't figure out any good solution. I just got a big headache. This is what I would do normally in TypeScript
export const Exceptions = {
"AuthenticationError": {
"Invalid": {
"en-GB": "Invalid username or password",
"sv-SE": "Fel användarnamn eller lösensenord"
},
"Required": {
"en-GB": "You must be authenticated to see this resource",
"sv-SE": "Du måste vara inloggad för att se denna resurs"
}
}
}
export class AuthenticationError extends Error {
constructor(language: "en-GB" | "sv-SE", message: "Invalid" | "Required") {
super(Exceptions.AuthenticationError[message][language]);
}
}
I would get errors if I typed any string wrong and it would simply just work. I tried to do something similar in go but the pain just got unbearable,
package localization
type labels struct {
enGB string
svSE string
}
type authenticationErrorMessages struct {
Invalid labels
Required labels
}
type exceptionMessages struct {
authErrors authenticationErrorMessages
}
// ExceptionMessage damnit, need to write a comment in an odd way.
func ExceptionMessage(language string, category string, exceptionType string, params []string) string {
var exceptionMsg = exceptionMessages{
authErrors: authenticationErrorMessages{
Invalid: labels{
enGB: "Invalid username or password",
svSE: "Fel användarnamn eller lösenord",
},
Required: labels{
enGB: "You must be authenticated to see this resource",
svSE: "Du måste vara inloggad för att se denna resurs",
},
},
}
switch category {
case "AuthenticationError":
switch category {
case "Invalid":
switch language {
case "enGB":
return exceptionMsg.authErrors.Invalid.enGB
case "svSE":
return exceptionMsg.authErrors.Invalid.svSE
}
case "Required":
switch language {
case "enGB":
return exceptionMsg.authErrors.Required.enGB
case "svSE":
return exceptionMsg.authErrors.Required.svSE
}
}
}
return "Error message not found"
}
// AuthenticationError damnit, need to write a comment in an odd way.
func AuthenticationError(message string) string {
return ExceptionMessage("enOps", "AuthenticationErrors", message)
}
TL;DR
So far everything has been really smooth writing golang code but this is just painful. I've tried out some localization packages as well but that hasn't worked out well so far. I'm of course not an expert in go after less than a day, maybe I missed something obvious in the language specification when I went through it this morning but regardless I'd really like to see some progress on this issue.
This is generally not how I would approach localization in Go (or any language), but going with your approach, I would do the following in go:
package main
import (
"fmt"
"golang.org/x/text/language"
"golang.org/x/text/message"
)
type AuthenticationError string
const (
ErrInvalid AuthenticationError = "ErrInvalid"
ErrRequired AuthenticationError = "ErrRequired"
)
func (e AuthenticationError) String() string { return string(e) }
func (e AuthenticationError) ErrorMessage(t language.Tag) string {
p := message.NewPrinter(t)
return p.Sprintf(e.String())
}
func init() {
message.SetString(language.English, ErrInvalid.String(), "Invalid username or password")
message.SetString(language.Swedish, ErrInvalid.String(), "Fel användarnamn eller lösensenord")
message.SetString(language.English, ErrRequired.String(), "You must be authenticated to see this localized resource")
message.SetString(language.BritishEnglish, ErrRequired.String(), "You must be authenticated to see this localised resource")
message.SetString(language.Swedish, ErrRequired.String(), "Du måste vara inloggad för att se denna resurs")
}
func main() {
fmt.Println(ErrRequired.ErrorMessage(language.Make("en-US")))
fmt.Println(ErrRequired.ErrorMessage(language.Make("en-gb-oed")))
fmt.Println(ErrRequired.ErrorMessage(language.Make("sv-FI")))
// Output:
// You must be authenticated to see this localized resource
// You must be authenticated to see this localised resource
// Du måste vara inloggad för att se denna resurs
}
This solves your concern of magical strings and auto-completion (at least from what I understand what you mean). It also implements fallbacks allowing for partial dictionaries, for example to only define translations where they differ for British and American english. (Notice the extra "localized" adjective in your message which I took the liberty to add for demonstrative purposes.)
But a more proper and scalable approach would be:
package main
import (
"fmt"
"golang.org/x/text/language"
"golang.org/x/text/message"
)
type AuthenticationError int
const (
ErrInvalid AuthenticationError = iota
ErrRequired
)
func (e AuthenticationError) ErrorString(p *message.Printer) string {
switch e {
case ErrInvalid:
return p.Sprintf("Invalid username or password")
case ErrRequired:
return p.Sprintf("You must be authenticated to see this localized resource")
}
return "Unknown"
}
var matcher language.Matcher
func init() {
insertGeneratedTranslations()
matcher = language.NewMatcher(message.DefaultCatalog.Languages())
}
func main() {
// Match does a lot of magic to find the best language for the user.
lang, _, _ := matcher.Match(language.Make("en-GB-oed"))
p := message.NewPrinter(lang)
fmt.Println(ErrInvalid.ErrorString(p))
fmt.Println(ErrRequired.ErrorString(p))
// Output:
// Invalid username or password
// You must be authenticated to see this localised resource
}
The gotext extract
command (golang.org/x/text/cmd/gotext) can then extract the strings that need translation and will spit out textdata/gotext_en.out.json
:
[
{
"original": {
"msg": "Invalid username or password"
},
"translation": {},
"position": "main/main.go:20:19"
},
{
"original": {
"msg": "You must be authenticated to see this localized resource"
},
"translation": {},
"position": "main/main.go:22:19"
}
]
This can then be used to create translations files for translators, the result of which can be used to generate code like the following.
func insertGeneratedTranslations() {
// hardwired here, but can be read from file or generated tables.
message.SetString(language.Swedish, "Invalid username or password", "Fel användarnamn eller lösensenord")
message.SetString(language.BritishEnglish, "You must be authenticated to see this localized resource", "You must be authenticated to see this localised resource")
message.SetString(language.Swedish, "You must be authenticated to see this localized resource", "Du måste vara inloggad för att se denna resurs")
}
In the future I hope to support gettext format and the like. Note that "en" is the default language (by default) and its strings do not need to be translated. You may still want to "translate" English to English when one wishes to have conditional output for plurals and the like (work in progress, but in an advanced state). Of course you can also do this by hand.
And if succinctness is of importance, you could also do:
//go:generate stringer -type=AuthenticationError
package main
import (
"fmt"
"golang.org/x/text/language"
"golang.org/x/text/message"
)
type AuthenticationError int
const (
ErrInvalid AuthenticationError = iota
ErrRequired
)
var matcher language.Matcher
func init() {
insertGeneratedTranslations()
matcher = language.NewMatcher(message.DefaultCatalog.Languages())
}
func main() {
// Match does a lot of magic to find the best language for the user.
lang, _, _ := matcher.Match(language.Make("en-GB-oed"))
p := message.NewPrinter(lang)
fmt.Println(p.Sprintf(ErrInvalid.String()))
fmt.Println(p.Sprintf(ErrRequired.String()))
}
And have go generate
create the Strings and use those as keys. This will require adding the English translations as well, though, and, more importantly, will disrupt a translation workflow as the translators will not have the original message and will make it hard to use gotext.
But if you use the SetStrings of the first example you're good to go.
I plan to add explicit support for errors, btw, which will make this a bit easier.
@mpvl this is looking good from small example above. I'd need to try to integrate it for better feedback.
In that vain, Is this ready for others to try and kick the tires? In other words, for me to try to support in i18n4go
or shall I wait a bit more? Also, if yes, do I need to build custom Go version to get this feature or is there a built version I can use.
Not urgent, let me know when you have a chance. Best,
max
@maximilien: I was indeed thinking I should contact you soon about integration, but you beat me to it. :) Up next is pluralization. It may be handy to wait until this is finished (quite close now), but up to you. If you think it is useful to sit together for a bit to crank this out next time I'm in the valley let me know.
FYI: the message package now has rudimentary number formatting support:
package main
import (
"golang.org/x/text/language"
"golang.org/x/text/message"
)
func main() {
p := message.NewPrinter(language.Make("bn"))
p.Printf("Numbers in Bengali: %d\n", 123456789)
// Supports Unicode's 'u' extension of BCP-47 language tags.
p = message.NewPrinter(language.Make("en-u-nu-fullwide"))
p.Printf("Use full-width digits in English: %d\n", 123456789)
// Output:
// Numbers in Bengali: ১২,৩৪,৫৬,৭৮৯
// Use full-width digits in English: 123,456,789
}
It doesn't seem to support lojban:
package main
import (
"fmt"
"golang.org/x/text/language"
"golang.org/x/text/message"
)
func main() {
p := message.NewPrinter(language.Make("jbo"))
p.Printf("Actual: Numbers in lojban: %d\n", 123456789)
fmt.Println("Expected: Numbers in lojban: parecivomuxazebiso")
// Output:
// Actual: Numbers in lojban: 123,456,789
// Expected: Numbers in lojban: parecivomuxazebiso
}
Quick reference:
. , 0 1 2 3 4 5 6 7 8 9
pi ki'o no pa re ci vo mu xa ze bi so
A B C D E F
dau fei gai jau xei vai
+ - ∞
ma'u ni'u ci'i
2-3i
re ka'o ni'u ci
NaN
na'a'u
@BenLubar The package currently does not support algorithmic or non-contiguous digits. That said, looking at the rbnf and numberingSystem files in CLDR, neither does CLDR. Once the RBNF methods are in it wouldn't be too hard to add, though.
Except maybe for the infix imaginary indicator. The 'i' is currently fixed and at a fixed position. Feel free to file a bug at http://unicode.org/cldr/trac. :)
@mpvl I'll wait a bit. NP. I know in CloudFoundry there is a push to refactor the CLI code which uses i18n4go
so at a minimum we can wait for that to start and be sorted w.r.t. i18n.
If you think it is useful to sit together for a bit to crank this out next time I'm in the valley let me know.
Yup, sounds good. Just ping me when you have some dates in mind. Got various travels planned in Oct and Nov and vacation in Dec, but should be around before that. Cheers 🍻
Just wanted to know the status of the repo, especially that of the gotext tool. It seems a lot of changes were made that don't match up with the docs, such as instead of a textdata directory I now get a locales folder, etc...
It seems that the gotext tool is broken as well currently preventing me from trying localization.
The gotext tool is under active development and one of the main focuses at the moment. Progress is a bit bursts, but definitely active. A documentation overhaul is part of that.
@mpvl: I'm looking at this and deciding if I want to use this or something else and manually format numbers/money/date . My users are a bit peculiar. What they usually want is the same behavior as in os. The language is set to e.g English, but other formatting is based on the country. Or even better overridden via some settings page also just like in OS.
x/text is very flexible with settings, although the use of it is somewhat hidden. Most settings are communicated through the language tags (" golang.org/x/text/language".Tag). Language tags implement BCP 47 tags, augmented with CLDR -u extensions http://www.unicode.org/reports/tr35/#Locale_Extension_Key_and_Type_Data. For instance:
The x/text packages that accept languages tags will extract the options from these tags that are relevant to them. The language.Matcher preserves these settings and also may add the -rg-xxxxxx tag when it notices the dialect is different from the expressed region.
The reason for this indirect approach is that preferences, in practice, are often expressed in language tags, for example through the "Accept-Language" HTTP header. This approach allows these settings to pass through the respective packages without the developer having to piece these out.
Anyway, if you rely on user preferences through languages tags, you don't have to worry about these. If you want to explicitly create user preferences, you can use the language.Tag's SetTypeForKey to create new language tags with updated preferences.
On Thu, 19 Jul 2018 at 08:13 Miha Vrhovnik notifications@github.com wrote:
@mpvl https://github.com/mpvl: I'm looking at this and deciding if I want to use this or something else. My users are a bit peculiar. What they usually want is the same behavior as in os. The language is set to e.g English, but other formatting is based on the country. Or even better overridden via some settings page also just like in OS.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/12750#issuecomment-406311698, or mute the thread https://github.com/notifications/unsubscribe-auth/AGJZR9i7gapj34aoIL4mq5QEZlDeJrrIks5uIKIYgaJpZM4GD3-F .
Is there a recommended way to localize [text|html]/templates per the proposal? I like the idea proposed, it doesn't seem to be implemented yet. Is that the case?
Not yet. There is a design for it, but it requires added functionality of the core template libraries.
On Tue, 26 Feb 2019 at 13:10 Eric Cox notifications@github.com wrote:
Is there a good way to mark text in go templates for translation?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/12750#issuecomment-467526165, or mute the thread https://github.com/notifications/unsubscribe-auth/AGJZR7curUH0IDM4-4BJ1TG3b4n3Swq_ks5vRWqggaJpZM4GD3-F .
Not yet. There is a design for it, but it requires added functionality of the core template libraries. … On Tue, 26 Feb 2019 at 13:10 Eric Cox @.***> wrote: Is there a good way to mark text in go templates for translation? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12750 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AGJZR7curUH0IDM4-4BJ1TG3b4n3Swq_ks5vRWqggaJpZM4GD3-F .
Is there any ETA for this feature or a suggested work-around?
Hi, just found this issue and cross-posting my recent proposal: https://github.com/golang/go/issues/34989
Are compact number formats something which could potentially fall under the responsibilities of the x/text package, and if so, what would be the process for creating a contribution to add this functionality?
Anything part of Unicode, including CLDR fits in the x/text mandate. You could modify the existing package to include it. The same process as with Go applies. As that is CLDR 35, it would require an upgrade to CLDR 35 of x/text first, which may take some effort.
Great, thank you. I'll look into the difficulty of getting that upgraded. In the meantime I put together a library which serves my purpose well enough for now (for anyone who happens to stumble upon this): https://github.com/nkall/compactnumber
@MickMonaghan: dates/times is on the list, but only after number etc. is completed.
Hi, since that message is from February 2017, I would like to know: date/time localization is getting closer to be implemented, or is still far in Go roadmap?
Thank you!
Hi, the documentation of x/text is mentioning the gender feature in several places.
Do I understand correctly, that this feature is currently not implemented?
Thank you.
@Xpert85 That is correct.
I was hacking on my https://github.com/purpleidea/mgmt/ and it occurred to me that I'd like proper gettext support! Sadly, you can't have an underscore function:
package main
import (
"fmt"
)
// gettext!
func _(format string, a ...interface{}) string {
return "just an example"
}
func main() {
fmt.Println("Hello, ", _("world"))
}
```./prog.go:13:26: cannot use _ as value```
But you can use two underscores! Sadly, the usefulness of this is not great, because if you stick that in a gettext
package and do a dot import:
import (
. "github.com/purpleidea/gettext"
)
it doesn't work because the function is seen as private, not public.
My proposal:
I'd like golang to consider treating the single underscore as a valid, public function. If that's too hard to do in the compiler, then to treat two underscores as a public function. This would go a long way into improving the readability of gettext translations in code =D
Thanks!
@purpleidea please file a separate proposal for that
Just use T instead of _ as a function name.
Hello, I wonder how to get the system language?
lang := ... // how to get the system default language?
p := message.NewPrinter(language.Make(lang))
p.Printf("xxx")
Can someone help me? Thanks a lot!
@youthlin Please don't ask questions on general issues. Please see https://golang.org/wiki/Questions. Thanks.
This issue is intended as an umbrella tracking issue for localization support.
Localization support includes:
Details to be covered in design docs.