Open zurk opened 5 years ago
Does this need another unicode.SpecialCase in https://golang.org/pkg/strings/#ToLowerSpecial ?
I do see a TODO in unicode/casetables.go.
@robpike @ianlancetaylor
CC @mpvl
Unicode case folding requires handling the final sigma special case, but the rule is overridden in a few standards; for example Appendix C of rfc7790 (PRECIS) says:
local case mapping is not applicable to small sigma or final sigma, so case mapping in the PRECIS framework always maps final sigma to small sigma, independent of context
Changing the strings.ToLower
function to handle the final sigma (in full compliance with Unicode Folding rules) may break existing code relying on the current behaviour. Also from a cursory look (but I may be wrong) the current special-case mechanism in unicode
does not support context-sensitive replacement rules, so it may be not trivial to implement the rule in a non-hacky way.
On the other hand, the text/cases
package handles the final sigma special case, and also provides a way to get a PRECIS compliant folding:
package main
import (
"fmt"
"golang.org/x/text/cases"
"golang.org/x/text/language"
)
func main() {
greekLower1 := cases.Lower(language.Greek)
greekLower2 := cases.Lower(language.Greek, cases.HandleFinalSigma(false))
fmt.Println(greekLower1.String("β︎Δℕ︎Σ")) // prints β︎δℕ︎ς
fmt.Println(greekLower2.String("β︎Δℕ︎Σ")) // prints β︎δℕ︎σ
}
My proposal is to preserve the existing strings
behaviour, and maybe add a small note about the final sigma handling in the documentation, and to point users to the text/cases
package for full Unicode Compliant folding.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
https://play.golang.org/p/fEDCPSV7Dqi
What did you expect to see?
The program output should be
β︎δℕ︎ς
because if you lowercaseΣ
at the last position of the word it becomesς
. See https://en.wikipedia.org/wiki/SigmaWhat did you see instead?
The output is
β︎δℕ︎σ
.I am not sure it is the only case in all languages when lower case depends on the position. I just faced different behavior with python code: