Sovos-Compliance / superobject

Automatically exported from code.google.com/p/superobject
3 stars 1 forks source link

ParseFile() and ParseStream() do not support UTF-8 #59

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Parsing a UTF-8 encoded file/stream with non-ASCII characters does not parse 
correctly.  For instance, if the file/stream contains the word "café" encoded 
in UTF-8, the parser outputs "café" instead.  The reason is because 
TSuperObject.ParseStream() only looks for a UTF-16 BOM and assumes anything 
else is ANSI.  It does not look for a UTF-8 BOM, or allow the caller to specify 
the source encoding (such as with the SysUtils.TEncoding class in 
Delphi/C++Builder 2009 and later) when a BOM is not present but the source 
encoding is otherwise known to the caller.

Original issue reported on code.google.com by gambit47 on 26 Aug 2014 at 2:10