Th3Ya0vi / pugixml

Automatically exported from code.google.com/p/pugixml
0 stars 0 forks source link

When printing the xml the apostrophe character is not escaped #207

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
the xml: <root id="pugi's"/>
or: <root id="pugi&apos;s"/>
parse the above xml and print back to file (no escape flag is off).

What is the expected output? What do you see instead?
the output should be always: <root id="pugi&apos;s"/>
but is actually: <root id="pugi's"/>

Which version of pugixml are you using? On what operating system/compiler?
1.2

Please provide any additional information below.
text_output_escaped function does not escape apostrophe character.

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Pr
edefined_entities_in_XML

Original issue reported on code.google.com by marek.k...@gmail.com on 22 May 2013 at 3:26

GoogleCodeExporter commented 8 years ago
Why is it a problem?

There's no requirement for the apostrophe character to be escaped in XML; 
pugixml tries to escape as little data as possible to preserve readability 
while producing well-formed output.

Original comment by arseny.k...@gmail.com on 23 May 2013 at 3:36

GoogleCodeExporter commented 8 years ago
Please see the wiki.
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Pr
edefined_entities_in_XML

"The XML specification defines five "predefined entities" representing special 
characters, and requires that all XML processors honor them."

Original comment by marek.k...@gmail.com on 23 May 2013 at 4:29

GoogleCodeExporter commented 8 years ago
Note that the Wikipedia is not an authoritative source of information wrt XML 
parsing. It's likely that in this case "honor" means "decode while parsing", 
not "encode while saving".

Please refer to the XML standard (http://www.w3.org/TR/REC-xml/) for further 
information; it clearly states that attribute values can contain unescaped 
apostrophe values:

    [10]    AttValue       ::=      '"' ([^<&"] | Reference)* '"'
|  "'" ([^<&'] | Reference)* "'"

And slightly related quote:

    To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " ' ", and the double-quote character (") as " " ".

In attribute values, pugixml chooses to escape > for symmetry reasons (< has to 
be escaped to conform to XML standard), but to not escape ' for increased 
output readability.

Original comment by arseny.k...@gmail.com on 23 May 2013 at 5:10

GoogleCodeExporter commented 8 years ago
Thanks for the explanation! You can close the issue.

Original comment by marek.k...@gmail.com on 24 May 2013 at 5:06

GoogleCodeExporter commented 8 years ago

Original comment by arseny.k...@gmail.com on 25 May 2013 at 3:39