TomWright / dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
https://daseldocs.tomwright.me
MIT License
7.09k stars 134 forks source link

elements ordering in XML is not preserved #196

Open stessaris opened 2 years ago

stessaris commented 2 years ago

Describe the bug

I'm trying to use dasel to process XML documents, but I realised that the order of the elements is not preserved. Since in XML the ordering is significant dasel cannot be used to safely process XML because of data loss. I don't know whether the problem can be fixed, I guess it depends on the underlying github.com/clbanning/mxj library. I found the closed issue clbanning/mxj#2 and the example clbanning/mxj/blob/master/examples/order.go that might be relevant for fixing the problem.

I guess that a big warning about this behaviour should be added in the documentation since using the tool without being aware of this may result in data corruption.

To Reproduce

Paste the following XML in the dasel playground and process it with with the arguments-p xml .:

<?xml version="1.0" encoding="UTF-8"?>
<message>
    <heading>Look</heading>
    <warning>Hello World</warning>
    <heading>Above</heading>
    <string>Last</string>
</message>

The result should be equal (up to spacing) but you get:

<message>
  <heading>Look</heading>
  <heading>Above</heading>
  <string>Last</string>
  <warning>Hello World</warning>
</message>

Expected behavior

Order of elements should be preserved.

TomWright commented 2 years ago

This is a bit of an issue with the way that dasel accesses data.

Dasel will:

  1. Decode the XML data into raw data (maps, slices, primitives)
  2. Process queries
  3. Encode the raw data into the desired data type

The issue here is that once the data is stored in memory as a map, go does not ensure ordering.

This is a bit of a blocker as it stands since this is an integral issue in the way data is handled.

stessaris commented 2 years ago

Sorry @TomWright for the late reply, but my mail server started to dump github notification in the spam.

I understand the problem and the fact that fixing it would require a major effort (which I'm not sure is would be worth because of the subtle complexity of XML). But I came to dasel because I needed a quick and simple tool to manipulate XML; fortunately, I realised the data corruption right away otherwise the experiments I was performing would have been wasted.

If I were the maintainer of dasel I'd drop XML support; but it's not up to me to decide. My suggestion is to emphasise the fact that only simple XML documents are supported and most of the XML that could be thrown at dasel won't preserve its DOM structure (note that even if you feed an XHTML page, the result would be something different). Basically, a huge "feed XML only if you know what are you doing and fully understand the limitation of the mxj library"

BTW I think that dasel is a nice tool!

0x7FFFFFFFFFFFFFFF commented 2 years ago

I suggest adding a limitations section in the read me file. I have to find this kind of limitations through the issues like this one. Major limitations should be mentioned. For example the element order in the output is not kept. And regex match is not supported.

stessaris commented 2 years ago

The NewMapXmlSeq function is doing the right thing by adding an ordering attribute (example). But I don't know how this would play with arbitrary updates when encoding back to XML, or with the dasel query language.

--sergio

TomWright commented 1 year ago

I am looking to resolve this issue as part of https://github.com/TomWright/dasel/pull/289.