falcong / pugixml

Automatically exported from code.google.com/p/pugixml
0 stars 0 forks source link

Wrong formatting linebreaks if node has pcdata plus other children #87

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When formatting the aoutput a simple XML file like
<Root><Node>"Node1"</Node><Node><Node>"SubNode"</Node>"Text"<Node></Root>

will be saved as:
<Root>
  <Node>Node1</Node>
  <Node>
    <Node>SubNode</Node>
    Text
  <Node>
</Root>

The next time the XML file is parsed the linebreaks and indents are part of the 
text:
<Root><Node>"Node1"</Node><Node><Node>"SubNode"</Node>"\n    Text\n    
"<Node></Root>
Which leads to additional line breaks the next time the file is saved:
<Root>
  <Node>Node1</Node>
  <Node>
    <Node>SubNode</Node>

    Text

  <Node>
</Root>
This is repeated everytime the file is parsed and saved again.

To solve this i'm using an additional flag format_smart_raw (off by default).
With this flag every node that has more than one subnode and at least one of 
them is a pcdata node is saved with the format_raw flag:
<Root>
  <Node>Node1</Node>
  <Node><Node2>SubNode</Node2>Text<Node>
</Root>

pugixml.hpp
193,195d192
<     // Use smart raw output mode (no indentation and no line breaks are 
written for nodes with more than one children and one or more pcdata nodes). 
This flag is off by default.
<     const unsigned int format_smart_raw = 0x10;
< 

pugixml.cpp
2942,2961c2942,2945
<                 unsigned int smartflag = flags;
< 
<                 if (flags & format_smart_raw)
<                 {
<                     for (xml_node n = node.first_child(); n; n = 
n.next_sibling())
<                     {
<                         if (n.type() == node_pcdata)
<                         {
<                             smartflag |= format_raw;
<                             break;
<                         }
<                     }
<                 }
<                 if (smartflag & format_raw)
<                     writer.write('>');
<                 else
<                     writer.write('>', '\n');
< 
<                 for (xml_node n = node.first_child(); n; n = n.next_sibling())
<                     node_output(writer, n, indent, smartflag, depth + 1);

---
>               writer.write('>', '\n');
>               
>               for (xml_node n = node.first_child(); n; n = n.next_sibling())
>                   node_output(writer, n, indent, flags, depth + 1);
2963c2947
<               if ((flags & format_indent) != 0 && (smartflag & format_raw) == 0)

---
>               if ((flags & format_indent) != 0 && (flags & format_raw) == 0)

Original issue reported on code.google.com by gordon.k...@gmail.com on 25 Nov 2010 at 2:14

GoogleCodeExporter commented 9 years ago
The problem is acknowledged.

There are several possible ways to fix that - I see three possible solutions:
- (optionally) switch to raw mode conditionally (one can do it for the 
immediate children or, as you do it, for the whole subtree)
- (optionally) trim the leftmost/rightmost whitespaces from PCDATA contents at 
saving time
- (optionally) trim the leftmost/rightmost whitespaces from PCDATA contents at 
parsing time - long ago there was a flag that did that, parse_trim_pcdata

Personally, I prefer the last approach - it seems cleaner to me, and it's even 
possible that it's a good default - thought that's a breaking change, obviously 
- because people very rarely view the indentation whitespace as significant. 
I'll look into the three approaches more closely, one of them will be 
implemented in the next version.

Original comment by arseny.k...@gmail.com on 26 Nov 2010 at 8:48

GoogleCodeExporter commented 9 years ago

Original comment by arseny.k...@gmail.com on 9 Feb 2014 at 12:49

GoogleCodeExporter commented 9 years ago

Original comment by arseny.k...@gmail.com on 9 Feb 2014 at 12:49

GoogleCodeExporter commented 9 years ago
Note: it is now possible to use parse_trim_pcdata flag to work around the issue.

I still plan to change the auto indenter to guarantee that the amount of 
whitespace in PCDATA does not grow during parse/save cycle; this should happen 
in v1.5.

Original comment by arseny.k...@gmail.com on 28 Feb 2014 at 6:50

GoogleCodeExporter commented 9 years ago
Moving this issue to GitHub: https://github.com/zeux/pugixml/issues/13

Original comment by arseny.k...@gmail.com on 26 Oct 2014 at 8:54