andreacomparini / go-charset

Automatically exported from code.google.com/p/go-charset
0 stars 0 forks source link

Using //TRANSLIT does not work (might be iconv problem) #7

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Summary:
When using iconv to Tranlate UTF-8 to ASCII//TRANSLIT the 'é' is not 
translated to 'e' (like expected). Instead, a questionmark '?' is returned.
I have tried to debug this and I can see that after C.iconv is called, cScrach 
contains the '?' already. When not using //TRANSLIT, the 'é' is being 
translated to 'xx', where 'x' is the invalid char I used, and I guess it's 
placed twice because 'é' is a multibyte (2 bytes) UTF-8 character.

Used code (note that the invalid char is 'x', NOT '?'):
package main

import (
    "code.google.com/p/go-charset/charset/iconv"
    "encoding/hex"
    "log"
    "fmt"
)

func main() {
    input := "AéA"
    t, err := iconv.Translator("ASCII//TRANSLIT", "UTF-8", 'x')
    if err != nil {
        log.Fatalf("Coult not get charset translator from UTF-8 to ASCII. Got error: %s\n", err)
        return
    }
    fmt.Print(hex.Dump([]byte(input)))
    n, cdata, err := t.Translate([]byte(input), true)
    if err != nil {
        log.Fatalf("Could not translate string '%s' to ASCII. Got error: %s\n", input, err)
    }
    fmt.Print(hex.Dump(cdata))
    output := string(cdata)
    log.Printf("Translated %d characters from UTF-8 ('%s') to ASCII ('%s')\n", n, input, output)
}

Result when running:
geertjohan@VirtKubuntu:~$ iconvtesting 
00000000  41 c3 a9 41                                       |A..A|
00000000  41 3f 41                                          |A?A|
2012/06/20 09:47:29 Translated 4 characters from UTF-8 ('AéA') to ASCII ('A?A')

Original issue reported on code.google.com by gjr19...@gmail.com on 20 Jun 2012 at 7:52

GoogleCodeExporter commented 8 years ago

Original comment by rogpeppe@gmail.com on 20 Jun 2012 at 9:16