Summary:
When using iconv to Tranlate UTF-8 to ASCII//TRANSLIT the 'é' is not
translated to 'e' (like expected). Instead, a questionmark '?' is returned.
I have tried to debug this and I can see that after C.iconv is called, cScrach
contains the '?' already. When not using //TRANSLIT, the 'é' is being
translated to 'xx', where 'x' is the invalid char I used, and I guess it's
placed twice because 'é' is a multibyte (2 bytes) UTF-8 character.
Used code (note that the invalid char is 'x', NOT '?'):
package main
import (
"code.google.com/p/go-charset/charset/iconv"
"encoding/hex"
"log"
"fmt"
)
func main() {
input := "AéA"
t, err := iconv.Translator("ASCII//TRANSLIT", "UTF-8", 'x')
if err != nil {
log.Fatalf("Coult not get charset translator from UTF-8 to ASCII. Got error: %s\n", err)
return
}
fmt.Print(hex.Dump([]byte(input)))
n, cdata, err := t.Translate([]byte(input), true)
if err != nil {
log.Fatalf("Could not translate string '%s' to ASCII. Got error: %s\n", input, err)
}
fmt.Print(hex.Dump(cdata))
output := string(cdata)
log.Printf("Translated %d characters from UTF-8 ('%s') to ASCII ('%s')\n", n, input, output)
}
Result when running:
geertjohan@VirtKubuntu:~$ iconvtesting
00000000 41 c3 a9 41 |A..A|
00000000 41 3f 41 |A?A|
2012/06/20 09:47:29 Translated 4 characters from UTF-8 ('AéA') to ASCII ('A?A')
Original issue reported on code.google.com by gjr19...@gmail.com on 20 Jun 2012 at 7:52
Original issue reported on code.google.com by
gjr19...@gmail.com
on 20 Jun 2012 at 7:52