Open folivoramao opened 1 year ago
Not sure what's wrong, as I am not familiar with the encoding, but I can point out a couple of details. First, you're getting the replacement character U+FFFD, which means there is something wrong with that character according to x/text. That is interesting. You can see this by printing things differently, and you can also simplify your example significantly since fmt.Printf can do all the hex/string work for you:
https://go.dev/play/p/kDgB3ybMa8c
Finally, you should always check your errors, especially when debugging, although that didn't help here.
It would appear that the decode table is just lacking data, the given test case would decode to 23705. https://go.googlesource.com/text/+/refs/heads/master/encoding/simplifiedchinese/tables.go#22009
whatwg seems to have changed urls for their table data, so I'm not sure what a new table would be generated from (presumably one of these https://encoding.spec.whatwg.org/#indexes )
(CC @mpvl per https://dev.golang.org/owners)
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I encountered a problem in character set encoding conversion: when using the simplifiedchinese package to convert a GB18030-encoded character to UTF8, an error is reported. But I can convert successfully when I use the mahonia package. code link:https://go.dev/play/p/NhBp0JQ2RUp
What did you expect to see?
What did you see instead?