For example, rawdata/utf8_lawstat/version2/01011/0101136061100.html ended with exception.
The 2-byte character of that word is 0xFECA, which is not defined in Big5. See Wikipedia for the range of undefined characters. I guess characters in those ranges cannot be converted to UTF-8.
While trying to parse with Ruby's String#encode, I list those throw Encoding::UndefinedConversionError while trying to convert from UTF-8 to Big5 in error.log of my own project. Some of them match the bug here. Maybe it would help.
For example,
rawdata/utf8_lawstat/version2/01011/0101136061100.html
ended with exception. The 2-byte character of that word is 0xFECA, which is not defined in Big5. See Wikipedia for the range of undefined characters. I guess characters in those ranges cannot be converted to UTF-8.While trying to parse with Ruby's
String#encode
, I list those throwEncoding::UndefinedConversionError
while trying to convert from UTF-8 to Big5 inerror.log
of my own project. Some of them match the bug here. Maybe it would help.