Closed HppZ closed 5 years ago
@kaalus @Beman @jewillco
@danieljames @Lastique @imikejackson Please HELP!
It looks like rapidxml parser used in Boost.PropertyTree assumes that the output (parsed) encoding is UTF-8, regardless of the character type. Therefore the 😥
constant gets decoded into 3 wchar_t
elements instead of one or two depending on its encoding. The relevant code is here:
I believe, it should be specialized based on the resulting character type.
@HppZ You may try parsing XML into a narrow-character ptree
and then converting it from UTF-8 to your wchar_t
encoding.
PS: Mentioning everyone in the ticket won't get it solved sooner. It's actually quite rude, as now I have to spam people with my comment.
I am really sorry and really thanks for your reply.
so there is a bug in rapidxml parser but how to report this bug to RapidXml since I don't see any contact info on the website http://rapidxml.sourceforge.net/.
There's an obfuscated email of the author on that page. But since we maintain a local fork in Boost.PropertyTree, we might as well fix it locally.
that is great. looking forward to your coming fix. thanks!
any plan to fix it? @Lastique
I have no plans to proposing a fix currently.
parse xml file:
c++ code:
xml file content:
result is: "你好ð¥"
other info: boost property tree v 1.68.0 VS 2017 15.9.4 Windows 10 17763.195