Open wader opened 4 months ago
Thanks for the input here I would much rather adhere to this standard and like this better for consistency sake and just general brevity with omitting a “data” key. Sorry, didn’t see this until recently but will resolve this and submit a release once patched for this.
Yeah i agree the modelling is nearly a bit too terse. fq do support another mode where it modells xml and html as nested ["element", {attributes}, [children...]]
arrays which is less lossy but is a bit of a pain to query.
$ fq -o array=true <<< '<html><b>111</b><b>222</b><a href="url">333</a><html>'
[
"html",
null,
[
[
"head",
null,
[]
],
[
"body",
null,
[
[
"b",
{
"#text": "111"
},
[]
],
[
"b",
{
"#text": "222"
},
[]
],
[
"a",
{
"#text": "333",
"href": "url"
},
[]
]
]
]
]
]
Got it, yeah I really like how this is handled in fq and would have emulated it if I had known better at the time. Although it's just a draft pr because I need to add some more tests and just cleanup on the messy branch - the changes in this should get this back in line with the standard. Thanks very much for the feedback here @wader!
I am going to leave this open since #16 still needs follow up with testing ensuring this and the change bundled in a few things that needed inclusion in a new release but it should be mostly covered with a few improvements to make. Some minor edge-cases are still occurring for more complex html that can be demonstrated in test-cases.
Would it make sense to share how XML and HTML is modelled with fq and mxj? i think this is the closes the spec there is to it https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html