Leonidas-from-XIV / node-xml2js

XML to JavaScript object converter.
MIT License
4.88k stars 604 forks source link

Preserving order of HTML text + tags #283

Open allthetime opened 8 years ago

allthetime commented 8 years ago

Hi, I'm trying to parse XML files from an external service that contain a block of HTML in them. I am using the parser to collect some other information and then target the html block and use the builder to rebuild it. But I am having an issue.

Say you have this originally:

<p>
      <span class="location>TORONTO, </span>
     It is a nice day...
</p>

The parser turns this into

{ 
    _: 'It is a nice day...',
   span: [{
         $: { class: 'location' },
         _: 'TORONTO,'
   }]
}

Causing the builder to return (backwards)

<p>
     It is a nice day...
      <span class="location>TORONTO, </span>
</p>

Am I missing a useful option to preserve the order? Or is there a way to stop the parser once it gets to a certain tag so that the HTML is never parsed?

rchampeimont commented 8 years ago

Maybe setting these options can help: charsAsChildren: true explicitChildren: true preserveChildrenOrder: true

In this case you would get an array with 2 elements : the SPAN as [0] and the text as [1].

kaymccormick commented 5 years ago

When I do this, the resulting object has all of the text content duplicated, and all of the children duplicated, I think. This results in an object that consumes more memory and processing time than would otherwise be required. Any fixes for this?

UnbearableBear commented 5 years ago

Agree on the fact that this is not an acceptable solution. I spend so much time to figure out how to preserve the order. I can't believe this is so complicated and that the output is so twisted. I cannot afford to have all the data duplicated in the json, plus the output structure is really mind bending.

eMahtab commented 2 years ago

explicitChildren: true, preserveChildrenOrder: true, charsAsChildren: true

Even if you use above three options, you will get a lot of duplicated data. I switched to https://github.com/nashwaan/xml-js that worked for my requirements. It preserves the order.

Also if your input xml is deeply nested, you might find the https://marketplace.visualstudio.com/items?itemName=nidu.copy-json-path plugin useful (I guess similar plugins are available for other IDE's as well). You can easily navigate the output JSON using this plugin. This saved a lot of time for me.