Closed tomicakorac closed 8 years ago
Hi,
Yes, it's possible to change the directory name while keeping in sync with Transifex: http://docs.transifex.com/developer/client/config#language-map
Anyway, is it a good idea to rename it? Or is it better to keep the current name and to update the router?
According to this RFC (2.2. Reserved Characters) it's a reserved character, like :
and '/' and shouldn't be used for anything other than their specified method, which in this case is denoting a user under a domain.
Seeing as that's not how we're using it, aliasing sr_Ijekavian
seems to be the recommended way to go.
That does sound like the way to go. I would just suggest, while we're at it, that we lose the capitalization, especially since it will be a part of a URL, and make it sr_ijekavian
For now, we were keeping capitalization in locale names (e.g. en_US
). So we should keep it for this locale too.
In case of the regional mark, such as _US, _GB, _DE, _FR etc. in my opinion, we're not talking about capitalization, but rather abbreviations, which is not the same as the capital initial letter only, especially because Ijekavian is a dialect, and not a territorial determinant. I would agree to have it in all caps if it was an abbreviation, e.g. _IJE, but in this case, if the parental language code is in small letters, I think the dialect should also be in small letters, also when we have in mind that it would be used in a web address.
I think it's better to stick to the original language code (sr@Ijekavian
) as possible. Anyway, that's not really very important.
See this PR: #449
http://beta.elementary.io/sr_Ijekavian/ returns 403 Forbidden now.
Yes, @fabianthoma needs to update the server config now.
Now returns the homepage, but still untranslated.
This appears to be still ongoing, but do we need sr_Ijekavian
when we have sr
? We've gotten rid of all the other localized languages?
I'm pretty sure that the vast majority of the visitors really do not need neither sr
nor sr_Ijekavian
. However, those few that do need one of those, really do. I see there is Arabic
localization, and then there is Arabic (Sudan)
. There is Norwegian Nynorsk
, and then there is Norwegian Bokmal
. There is Portuguese (Portugal)
, and then there is Portuguese (Brasil)
. There is Chinese (Traditional)
, and then there is Chinese (Simplified)
. While it's true that who ever understands Serbian Ijekavian
will also understand Serbian Ekavijan
(the normallized standard), there are two points that I do not understand:
Serbian (Ijekavian)
is the only localization out of the several dozen existing ones that we are having difficulty implementing?Serbian Ijekavian
the only localization whose necessity is being questioned?Most of those localizations don't appear on the actual site, only in transifex. So, for your second question, we've already discussed the need for secondary localization and decided that in most cases it is not necessary. As I said, we've previously gotten rid of such languages. We did say, however, that it would be reviewed on a case by case basis.
For your first question, no standards-compliant locale has characters like @
in their identifier. Locales should be identified by a two letter code, like en
, or two sections of identical code, separated by an underscore, as in en_GB
. Because @
is a reserved character, typically used for identifying users in a domain, most browsers do not take kindly to it being used in URLs.
I'm not sure what you mean by Most of those localizations don't appear on the actual site, only in transifex.
, since all the localizations I've mentioned do exist and are available to choose at http://elementary.io.
About the first issue, I've mentioned before that the notation which Transifex chose to use in this case is an obvious cause of the problem, but at the same time there is no particular reason to follow that notation. No one has ever used that exact marking, and it's still unclear to me why they did so. I might bring this up with Transifex, if that would resolve our problem, but I also believe that even if Transifex notation remained, there would still be a way to fix this bug.
I don't see the Norwegian localization on the site, but considering language names are translated, it's likely just me. As for the notation, we are indeed inheriting it from Transifex and every time we update the translations it will be pulled down into that location. I think we've tried to alias it but right now the nginx.conf has a regex designed solely for two letter language codes. Is there an equivalent for sr@Ijekavian
in the format en_GB
?
The two Norwegian dialects are listed as just Bokmal
and Nynorsk
, but I can see they're both there. As well as all the other double localizations I've mentioned.
I am not aware of any two-letter code for Ijekavian, unfortunately. But, again in my opinion, there are two facts we should have in mind here:
en_GB
notation has never been standardized anywhere to this date. It's just something that once seemed right to someone, and then everyone just kinda went along with it (with slight modifications here and there, as we can see in the example of Transifex). Although for most of the 'big' languages out there the flaws of this kind of notation are not easily noticeable, or even do not exist, there are several strong arguments against it in general.
sr_RS
abbreviation would be persistent, and the meaning of both has changed in time, so what has been sr_RS
20 years ago is not the same as today, and expecting it to change again in the future is also justified. I really don't want to go into details as to why or how it happened with Serbian, but I know dozens of other languages with the same or similar problems of mostly political nature (which, I hope we can agree, must not be of any concern to simple translators).en_GB
notation presupposes that a single language will be strictly concealed within the territory of a single country, and/or vice versa, in reality this is a mere exception reserved for the several most developed and politically stable countries. As for the rest, it just won't do. sr_RS
does not cover Serbian language in Serbia, as there are two major dialects of Serbian being spoken in Serbia. On the other hand, there are two major dialects of Serbian being spoken in Bosnia and Herzegovina, but at the same time they're identical to their respective equivalents in Serbia. In short, territorial boundaries do not have any role here, and will just bring in confusion and probably even political controversy. Again, even though I've illustrated my point on an example most visible to me, I've come across a great number of languages with the same problems, and only a handfull of those which are lucky enough to not be affected.en_GB
notation. Having all that in mind, I see no reason whatsoever for us to come up with a new non-standard localization code (at leaset for Serbian) which would suit our needs.I can suggest the following:
sr_EK
- Serbian Ekavian (the normalized standard);sr_JK
- Serbian Ijekavian;sr_IK
- Serbian Ikavian (currently missing from Transifex);If we want to keep the xx_YY
format for languages, we have to choose a free country code from here: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Decoding_table
I would recommend to use User-assigned codes, to avoid using a code that could be assigned for a country in the future. But since this second code is related to the country and we are speaking about dialects, this does not make sense...
@lewisgoddard Right now, Serbian Ijekavian has been renamed to sr_Ijekavian
in the Transifex config file. Do you think it's possible to allow [lang]-[country]_[dialect]
in language codes in Nginx config? Or would you prefer not to change the config and select a custom country code for Serbian, preventing page names like aa_bcd
to be mistaken as language codes?
BTW, we should update Nginx config to allow three-letters language codes (e.g. Chinese (Min Bei) (mnp)): https://www.transifex.com/languages/
I had written a piece, but I've been having power-related issues at work. Effectively, I'd prefer to keep the regex simple and modify this one outlier. Very rarely is more complex regex better. sr_JK
seems like the best option.
As for three letter codes, I am not overly familiar with regex, but changing it like this might do it.
rewrite "^/([a-z]{2}(?:_[A-Z]{2})?)/(.*)$" /$2?lang=$1 last;
rewrite "^/([a-z]{2}(?:_[A-Z]{2-3})?)/(.*)$" /$2?lang=$1 last;
The correct regex for three-letter codes is:
rewrite "^/([a-z]{2,3}(?:_[A-Z]{2})?)/(.*)$" /$2?lang=$1 last;
Okay, then let's change the language code mapping.
@emersion Currently, /lang/ has sr
and sr_Ijekavian
. What are we doing moving forward?
We can move /lang/sr_Ijekavian/ to /lang/sr_JK/, change mappings in Transifex config file and this issue should be solved.
I'm not sure what is the best way to go about fixing this problem.
a) Transifex's code for Ijekavian dialect of Serbian language is sr@Ijekavian. b) eOS's beta site is configured to have localized URL in this case as http://beta.elementaryos.io/sr@Ijekavian/
I'm guessing that the '@' in the URL is causing the URL to fail. If I am right, these are the questions:
If I am not right (if the '@' in the URL isn't the cause of this problem), does anyone else have an idea why this URL is failing?
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.