Share HTML and XML modelling with mxj and fq?

JFryy / qq

jq inspired (and gojq dependent) interoperable config format transcoder with interactive querying.

MIT License

545 stars 3 forks source link

Share HTML and XML modelling with mxj and fq? #14

Open wader opened 3 months ago

wader commented 3 months ago

Would it make sense to share how XML and HTML is modelled with fq and mxj? i think this is the closes the spec there is to it https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html

$ go run . -i html <<< '<html><b>111</b><b>222</b><a href="url">333</a><html>'
{
  "html": {
    "body": {
      "a": {
        "attr": {
          "href": "url"
        },
        "data": "333"
      },
      "b": [
        {
          "data": "111"
        },
        {
          "data": "222"
        }
      ]
    }
  }
}

$ fq <<< '<html><b>111</b><b>222</b><a href="url">333</a><html>'
{
  "html": {
    "body": {
      "a": {
        "#text": "333",
        "@href": "url"
      },
      "b": [
        "111",
        "222"
      ]
    },
    "head": ""
  }
}

JFryy commented 3 months ago

Thanks for the input here I would much rather adhere to this standard and like this better for consistency sake and just general brevity with omitting a “data” key. Sorry, didn’t see this until recently but will resolve this and submit a release once patched for this.

wader commented 2 months ago

Yeah i agree the modelling is nearly a bit too terse. fq do support another mode where it modells xml and html as nested ["element", {attributes}, [children...]] arrays which is less lossy but is a bit of a pain to query.

$ fq -o array=true <<< '<html><b>111</b><b>222</b><a href="url">333</a><html>'
[
  "html",
  null,
  [
    [
      "head",
      null,
      []
    ],
    [
      "body",
      null,
      [
        [
          "b",
          {
            "#text": "111"
          },
          []
        ],
        [
          "b",
          {
            "#text": "222"
          },
          []
        ],
        [
          "a",
          {
            "#text": "333",
            "href": "url"
          },
          []
        ]
      ]
    ]
  ]
]

JFryy commented 2 months ago

Got it, yeah I really like how this is handled in fq and would have emulated it if I had known better at the time. Although it's just a draft pr because I need to add some more tests and just cleanup on the messy branch - the changes in this should get this back in line with the standard. Thanks very much for the feedback here @wader!

https://github.com/JFryy/qq/pull/16

JFryy commented 2 months ago

I am going to leave this open since #16 still needs follow up with testing ensuring this and the change bundled in a few things that needed inclusion in a new release but it should be mostly covered with a few improvements to make. Some minor edge-cases are still occurring for more complex html that can be demonstrated in test-cases.