Leonidas-from-XIV / node-xml2js

XML to JavaScript object converter.
MIT License
4.88k stars 604 forks source link

Unexpected Translation of JS Array #428

Open attermann opened 6 years ago

attermann commented 6 years ago

I have a need to encode a js array as xml. For an array like the following:

    headers: [
        { header: { '$': { 'method': 'INVITE', name: ‘X-Mode', value: 'standard' } } },
        { header: { '$': { 'method': 'INVITE', name: ‘X-Prop', value: ‘default' } } }
    ]

I am getting this XML:

  <headers>
    <header method="INVITE" name="X-Mode" value=“standard"/>
  </headers>
  <headers>
    <header method="INVITE" name=“X-Prop" value=“default"/>
  </headers>

Instead of the expected:

  <headers>
    <header method="INVITE" name="X-Mode" value=“standard"/>
    <header method="INVITE" name=“X-Prop" value=“default"/>
  </headers>

I suspect I may just be missing something but can’t figure out what it is after reading through the README. Hoping somebody can advise how to make this work.

Thanks!

attermann commented 6 years ago

Anybody have any input here? I can't imagine I'm the only one needing to correctly encode an XML array.

RikkiGibson commented 6 years ago

For the following sample:

const xml2js = require('xml2js');
const headers = {
    headers: [
        { header: { '$': { 'method': 'INVITE', name: 'X-Mode', value: 'standard' } } },
        { header: { '$': { 'method': 'INVITE', name: 'X-Prop', value: 'default' } } }
    ]
}
console.log(new xml2js.Builder().buildObject(headers));

I am seeing the following output:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<headers>
  <header method="INVITE" name="X-Mode" value="standard"/>
  <header method="INVITE" name="X-Prop" value="default"/>
</headers>

Is there any context missing from the question?

ccamarat commented 6 years ago

The culprit is here. A new parent node is being created each time the loop is iterated. The root-level array handling doesn't have this problem...

kiwi-josh commented 6 years ago

I can confirm that I have just run into the same issue 🤒

dbrooks75 commented 6 years ago

Try:

headers: { header: [ {'$': { 'method': 'INVITE', name: ‘X-Mode', value: 'standard' }}, {'$': { 'method': 'INVITE', name: ‘X-Prop', value: ‘default' }} ] }

attermann commented 6 years ago

Confirmed findings of @ccamarat that the issue exists only when array is not at root-level:

const xml2js = require('xml2js');
const headers = {
    account: {
        headers: [
            { header: { '$': { 'method': 'INVITE', name: 'X-Mode', value: 'standard' } } },
            { header: { '$': { 'method': 'INVITE', name: 'X-Prop', value: 'default' } } }
        ]
    }
}
console.log(new xml2js.Builder().buildObject(headers));

Output:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<account>
  <headers>
    <header method="INVITE" name="X-Mode" value="standard"/>
  </headers>
  <headers>
    <header method="INVITE" name="X-Prop" value="default"/>
  </headers>
</account>

Also confirmed that the fix suggested by @dbrooks75 is a suitable workaround:

const xml2js = require('xml2js');
const headers = {
    account: {
        headers: {
            header: [
                { '$': { 'method': 'INVITE', name: 'X-Mode', value: 'standard' } },
                { '$': { 'method': 'INVITE', name: 'X-Prop', value: 'default' } }
            ]
        }
    }
}
console.log(new xml2js.Builder().buildObject(headers));

Output:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<account>
  <headers>
    <header method="INVITE" name="X-Mode" value="standard"/>
    <header method="INVITE" name="X-Prop" value="default"/>
  </headers>
</account>

Thanks a lot @dbrooks75!

jcsahnwaldt commented 6 years ago

@attermann Can this issue be closed?

ccamarat commented 6 years ago

While the work around does work, the fact that root-level arrays are interpreted differently than nested arrays seems like a problem. One expects the same input to always produce the same output, regardless of its nesting level.

jcsahnwaldt commented 6 years ago

@ccamarat OK, I see. What you wrote makes sense, but the problem is that xml2js cannot always produce the same output for the same input regardless of nesting level, because there cannot be multiple XML root elements.

More generally: Currently, buildObject accepts some JS objects that are 'unusual' (meaning parseString would not produce them) and turns them into more or less useful, but often broken or surprising XML.

I think buildObject should simply reject most structures that parseString would not produce. A meaningful error message is better than a broken XML result, and the worst is XML that is well-formed but has an unexpected structure, because it makes you think your code is almost correct, when in fact it's quite wrong...

Example:

buildObject({y: {x: ['a', 'b']}}) currently produces the following XML:

<y>
  <x>a</x>
  <x>b</x>
</y>

That's fine.

But if we want the same output for the same input regardless of its nesting level, then buildObject({x: ['a', 'b']}) should produce the following XML:

<x>a</x>
<x>b</x>

But that's not a well-formed XML document... :(

I think buildObject({x: ['a', 'b']}) should throw an exception.

jcsahnwaldt commented 6 years ago

I think there are two or three deeper questions here:

  1. XML and JavaScript structures don't quite match. What to do?
  2. To what extent should parseString and buildObject be 'symmetrical'?
  3. More generally, how flexible should xml2js be?

1. Structural differences between XML elements and JS objects

XML has elements, attributes and text content. JS has objects, arrays and primitive types. xml2js generally represents each XML element as a JS object (and vice versa).

2. To what extent should parseString and buildObject be 'symmetrical'?

In general, buildObject is designed to handle JS objects whose structure is equal (or at least very similar) to the structure of the objects produced parseString. But JS objects can have many shapes that parseString would never produce. What should buildObject do with such objects?

Examples:

3. More generally, how flexible should xml2js be?

jcsahnwaldt commented 6 years ago

It looks like buildObject has grown organically - whenever there was a feature request, it was added. That's fine, but it has led to behavior that is not well-defined for many corner cases.

In the long run, it would probably be better if buildObject was a bit stricter: produce well-formed XML for well-formed JS objects, but reject (by throwing an error) JS objects that are not well-formed.

In a nutshell: buildObject should be flexible, but not so flexible that no one can predict what it will do with a certain JS object.

Don't get me wrong, I love xml2js. It's super easy to use, yet super flexible. APIs that combine both of these properties pretty are pretty rare. Good job!

RikkiGibson commented 6 years ago

buildObject({x: ['a', 'b']}) seems fine to me if you set explicitRoot and rootName.

ccamarat commented 6 years ago

@jcsahnwaldt - great analysis. Agreed that my statement was overly general and you picked it apart well.

My impression is that the library is "good enough" for the JavaScript community as a whole, and XML usage in web applications send to be decline declining so devoting energy to making it "perfect" may not be the best use of the maintainers' time.

You made the comment "I think buildObject should simply reject most structures that parseString would not produce. A meaningful error message is better than a broken XML result, and the worst is XML that is well-formed but has an unexpected structure, because it makes you think your code is almost correct, when in fact it's quite wrong..."

Perhaps this is all that's really needed. It would satisfy me, in any case.

Leonidas-from-XIV commented 6 years ago

@jcsahnwaldt I really appreciate your involvement, looking at issues and answering and closing them. Thanks!

So let me weight in with some history, some early mistakes and some future directions (hopefully).

As such, I am not the original author of xml2js, I took it over from @maqr as he didn't maintain it and I needed it for some project (which eventually never materialized, but so is the nature of things). I merged in some existing pull requests, wrote tests and created documentation. I also rewrote the library to use CoffeeScript (that was back then when CoffeeScript was popular, nowadays I wouldn't use it anymore, but these were different times). When I took over, the library had already some users and a number of options like trim etc, that affected the output.

Over time, a number of features were contributed, among these the XML builder (the original name xml2js only signifies a one way conversion). This is a very popular feature, but I almost never hear back from contributors about maintaining their added feature, so the builder is in a bit of a nowhere-land where it is sort of easy to produce JS objects with the parser that the builder can't serialize back.

I have added a number of options to make the JS object output a little more predictable (like explicitArray to avoid breaking code if suddenly one element ends up having more than one node and the code suddenly starts failing), but have retained backwards compatibility options with older releases so you can configure it to output exactly the same format it did 7 years ago), while also making the API a bit nicer. But the existence of these options creates issues all the time and the builder jumps through all kinds of hoops to generate some XML from JS objects that don't make sense, so the resulting XML sometimes tends to also not make any sense.

That should not be the case. What I much rather have is to have a fixed, documented semantics (preferably with Flow/TypeScript definitions) on how xml2js will parse XML (also, XML, not "something that contains < and > and sort of looks like XML if you squint) into a JS object, reliably error if the XML is not well-formed and probably be callback-less (I also plan to stop using sax.js, which uses callbacks but is not actually async, a property that xml2js unfortunately inherits, but depending on the sync behaviour in xml2js feels a bit wrong).

So yes, I think the parser and builder should be round-tripable (at least in the semantic sense of XML, I don't necessarily care about reproducing the input XML byte for byte), but for this the parser must lose some options. Fortunately, these options are generally not a good idea either and if people want to do that, they can easily do that as a post-processing step after the parser has finished.

I have planned to rework the code (since, if you look inside it, it is a mess of deeply nested if statements and mysteriously mutated state) in a more typed and rigid way so it is actually possible to maintain the code, but since I don't actually use xml2js what I am mostly left with is merging contributions I think I can continue to maintain and make sense. I think I said way too often yes to features that were not a good idea 😉

Going back to the issue at hand, I think it would make sense if the builder just rejected JS objects that it can't convert to XML instead of generating a hot mess of nonsense that maybe some other poor soul will have to parse.

FelDev commented 6 years ago

I noticed that js2xml (not xml2js) deals with arrays by placing the array items between \<item>\</item> tags

Would that be something that makes sense to you?

Leonidas-from-XIV commented 6 years ago

@Mesieu After looking at js2xml it left me a little bit puzzled: Why does it insert item? How do I customize it? What if I want to customize it per-array?

To me the behaviour seems (specified but) arbitrary. I'd rather have a reasonably close correspondence between what xml2js parses to the JS object, which also makes serializing the JS object back to XML more or less unsurprising and does not require adding processing instructions (like "make this array items expand to <item/> tags").

FelDev commented 6 years ago

@Leonidas-from-XIV I'm convinced.