NaturalIntelligence / fast-xml-parser

Validate XML, Parse XML and Build XML rapidly without C/C++ based libraries and no callback.
https://naturalintelligence.github.io/fast-xml-parser/
MIT License
2.49k stars 302 forks source link

Is there a way to prevent xml tag spreading in case of preserveOrder? #520

Closed mdeknowis closed 1 year ago

mdeknowis commented 1 year ago

Description

If I use preserveOrder my xml nodes get spreaded. This behaviour can be reproduced also using https://naturalintelligence.github.io/fast-xml-parser/

Input

I parse a XML like this

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
  <modelVersion>4.0.0</modelVersion>
  <version>0.0.1-SNAPSHOT</version>
  <properties>
    <timestamp>${maven.build.timestamp}</timestamp>
    <!-- Comment -->
  </properties>
</project>

and as I use the builder all tags a spreaded like that

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
  <modelVersion>
    4.0.0
  </modelVersion>
  <version>
    0.0.1-SNAPSHOT
  </version>
  <properties>
    <timestamp>
      ${maven.build.timestamp}
    </timestamp>
    <!-- Comment -->
  </properties>
</project>

Without preserveOrder this is not happing, but I need the correct order, as I try to manipulate the XML and have to keep the comments at the correct place

Code

export async function serializeXmlFile(pomXmlFilePath: string, xml: any): Promise<void> {
  const xmlBuilder = new XMLBuilder({
    format: true, // create multiple line xml file
    ignoreAttributes: false, // preserve attributes
    preserveOrder: true, // preserve the order of the original XML file
    // cdataPropName: `#cdata`, // preserve CDATA blocks
    commentPropName: `#comment`, // preserve comment blocks
    suppressEmptyNode: true,
  });
  // log.info(`xml`, xml);
  const xmlStringContent = xmlBuilder.build(xml);
  // log.info(`xmlStringContent`, xmlStringContent);
  await fs.writeFile(pomXmlFilePath, xmlStringContent, { encoding: `utf-8` });
}

export async function parseXmlFile(pomXmlFilePath: string): Promise<any> {
  const xmlContent = await fs.readFile(pomXmlFilePath, `utf-8`);
  // log.info(`xmlContent`, xmlContent);
  const xmlParser = new XMLParser({
    ignoreAttributes: false, // preserve attributes
    parseTagValue: false, // preserve correction version like "1.70" otherwise we loose the trailing zero
    preserveOrder: true, // preserve the order of the original XML file
    // allowBooleanAttributes: true, // preserve attributes without value
    // cdataPropName: `#cdata`, // preserve CDATA blocks
    commentPropName: `#comment`, // preserve comment blocks
    // trimValues: false, // preserve whitespaces
  });
  const xml = xmlParser.parse(xmlContent);
  // log.info(`xml`, xml);
  return xml;
}

Output

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
  <modelVersion>
    4.0.0
  </modelVersion>
  <version>
    0.0.1-SNAPSHOT
  </version>
  <properties>
    <timestamp>
      ${maven.build.timestamp}
    </timestamp>
    <!-- Comment -->
  </properties>
</project>

expected data

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
  <modelVersion>4.0.0</modelVersion>
  <version>0.0.1-SNAPSHOT</version>
  <properties>
    <timestamp>${maven.build.timestamp}</timestamp>
    <!-- Comment -->
  </properties>
</project>
github-actions[bot] commented 1 year ago

I'm glad you find this repository helpful. I'll try to address your issue ASAP. You can watch the repo for new changes or star it.

amitguptagwl commented 1 year ago

What I can understand here is format issue. This is not in the development plan for ordered parse response yet. However, PR is welcome.

mdeknowis commented 1 year ago

This formatting change only occurs, if preserveOrder is enabled. Also only for non-comment tags.

I guess, it might be caused by src\xmlbuilder\orderedJs2Xml.js:

function arrToStr(arr, options, jPath, level){
    let xmlStr = "";

    let indentation = "";
    if(options.format && options.indentBy.length > 0){//TODO: this logic can be avoided for each call
        indentation = EOL + "" + options.indentBy.repeat(level);
    }
// ...
        if(tagName === options.textNodeName){
            let tagText = tagObj[tagName];
            if(!isStopNode(newJPath, options)){
                tagText = options.tagValueProcessor( tagName, tagText);
                tagText = replaceEntitiesValue(tagText, options);
            }
            xmlStr += indentation + tagText; // <--- Here might be the root cause. Let's check, how to avoid that part and improve it
            continue;
//...
amitguptagwl commented 1 year ago

Yes. there are 2 separate codes for prserved order parsed output and pretty output

amitguptagwl commented 1 year ago

I hope this is not issue anymore. Please reopen if it is not the case