Open matttyson opened 5 years ago
Faced the same issue. I found that both src and dst streams use the same memory (stack_) causing UTF-16 symbols overwrite UTF-8 source. Since all digits have 0 in the second byte in UTF-16, the result is just first correct symbol and zeros instead of the others. (reader.h)
SizeType numCharsToCopy = static_cast<SizeType>(s.Length());
StringStream srcStream(s.Pop());
StackStream<typename TargetEncoding::Ch> dstStream(stack_);
while (numCharsToCopy--) {
Transcoder<UTF8<>, TargetEncoding>::Transcode(srcStream, dstStream);
}
@miloyip
Hi! Faced this bug in our project. @miloyip, do you need any additional information? Is it possible to fix it?
This issue is duplicated and fixed by https://github.com/Tencent/rapidjson/issues/1923 and https://github.com/Tencent/rapidjson/pull/1926
When parsing a UTF16 document using the kParseNumbersAsStringsFlag, only the first character of the string is provided to the RawNumber() handler. The remaining characters are null terminators.
This bug does not manifest if the document is UTF8, or if the document is UTF16 and the kParseInsituFlag is used.
I've provided a reproducer below, but I haven't found the cause of the bug as yet.
compile with -DWIDE to enable wide characters, and -DINSITU to enable insitu parsing Expected output is "500" in all cases
Below is the reproducer code