boostorg / property_tree

Boost.org property_tree module
http://boost.org/libs/property_tree
55 stars 93 forks source link

This seems to be a bug #36

Closed HppZ closed 5 years ago

HppZ commented 5 years ago

parse xml file:

c++ code:

std::locale utf8Locale(std::locale(), new std::codecvt_utf8<wchar_t>());
std::wifstream f(*path);
f.imbue(utf8Locale);
wptree tree;
read_xml(f, tree);

xml file content:

<?xml version="1.0" encoding="UTF-8"?>
<data>
<content>你好&#128549;</content>
</data>

result is: "你好😥" image

other info: boost property tree v 1.68.0 VS 2017 15.9.4 Windows 10 17763.195

HppZ commented 5 years ago

@kaalus @Beman @jewillco

HppZ commented 5 years ago

@danieljames @Lastique @imikejackson Please HELP!

Lastique commented 5 years ago

It looks like rapidxml parser used in Boost.PropertyTree assumes that the output (parsed) encoding is UTF-8, regardless of the character type. Therefore the &#128549; constant gets decoded into 3 wchar_t elements instead of one or two depending on its encoding. The relevant code is here:

https://github.com/boostorg/property_tree/blob/29a7f0390b6904e9b7c447d1e6e44442bd0ab17b/include/boost/property_tree/detail/rapidxml.hpp#L1506-L1532

I believe, it should be specialized based on the resulting character type.

@HppZ You may try parsing XML into a narrow-character ptree and then converting it from UTF-8 to your wchar_t encoding.

PS: Mentioning everyone in the ticket won't get it solved sooner. It's actually quite rude, as now I have to spam people with my comment.

HppZ commented 5 years ago

I am really sorry and really thanks for your reply.

so there is a bug in rapidxml parser but how to report this bug to RapidXml since I don't see any contact info on the website http://rapidxml.sourceforge.net/.

Lastique commented 5 years ago

There's an obfuscated email of the author on that page. But since we maintain a local fork in Boost.PropertyTree, we might as well fix it locally.

HppZ commented 5 years ago

that is great. looking forward to your coming fix. thanks!

HppZ commented 5 years ago

any plan to fix it? @Lastique

Lastique commented 5 years ago

I have no plans to proposing a fix currently.