cue-lang / cue

The home of the CUE language! Validate and define text-based and dynamic configuration
https://cuelang.org
Apache License 2.0
5.14k stars 294 forks source link

pkg/encoding/xml support #330

Open cueckoo opened 3 years ago

cueckoo commented 3 years ago

Originally opened by @rudolph9 in https://github.com/cuelang/cue/issues/330

WIP

Summary

Support for:

This package would allow for lossless converting to and from XML. This would be an MVP approach focusing on a Cuelang representation according to the w3 REC-xml-20081126 spec with focus on usability from the perspective of Cuelang and extensibility allowing for eventual support of common XML schemas and extensions (e.g DTD, XSD, etc).

This would be similar to pkg/encoding/json with but unlike JSON there is not a clear one-to-one mapping to/from Cuelang/XML. Best effort made to balance user friendliness with open ended design to eventually support common XML schemas and extensions.

Approach

Using a lossless cuelang schema for capturing XML data such that cuelang can be programmatically converted to and from XML. Conversion will happen by import "encoding/xml" and xml.Unmarshal(string|byte) // => xml.Schema & xmlValue to convert a string representation of an xml document to a cuelang struct constrained by xml.Schema and xml.Marshal(xml.Schema & xmlValue) // => xmlString.

Example

Without putting consideration into usability; the xml.Schema could follow the schema defined by existing xml-to-json converters given cuelang is a superset of json. A browser based accessible version can be found here. The input to xml.Unmarshal and return of xml.Marshal would be the left pane. The input to xml.Marshal and return of xml.Unmarshal would be the right pane. A rough impl of xml.Schema would be defined as follows:

XML :: {
  NameRegex :: =~ "[^-].*" // quick and dirty better def found here https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name
  AttrRegex :: =~ "-.*"
  XmlSchema :: {
    [NameRegex]: {
      "#text"?: string | [...string]
      [NameRegex]: XmlSchema
      [AttrRegex]: string
    }
  }
}

xmlString:

<?xml version="1.0" encoding="UTF-8" ?>
    <hello>
        <world planet="earth">!</world>
    </hello>

xmlValue

{
  "hello": {
    "world": {
      "-planet": "earth",
      "#text": "!"
    }
  }
}

The converter used in the above example takes some liberties such as order of keys in a json object are not guaranteed and therefore not guaranteed to be lossless. Also, it's not particularly user friendly. A lot more could be done on top of this using Defintions in the schema.

Additional considerations

Preferably, all info in the original XML doc would should be captured in something that can be exported as raw json and any user friendly convinces could be defined using Defintions and programmatically filled taking unifying XmlSchema & xmlValue.

The logic of converting the concrete values of a cuelang string to/from XML would be implemented in Go.

Resources

cueckoo commented 3 years ago

Original reply by @mpvl in https://github.com/cuelang/cue/issues/330#issuecomment-612180378

It probably makes sense to take into account the proposal for new-style definition. Also the contemplated embedded scalar values (like top-level emit values but then for any struct) may provide a nice way to model XML attributes. Also, I've been contemplating a guaranteed topological sort on fields that could be made possible by the new evaluator. This may help with field order and may also influence design decisions.

Taking these aspects into account might mean holding off on this design. It can also mean design XML in light of these features and helping to come up with possible requirements while this still can be influenced. This requires (a lot) more work, of course, but can be very beneficial.

ElliotSwart commented 2 years ago

Here is the best resource I've found so far detailing past xml/json conversion schemes. A decent bit of work has been done on the topic, especially back when XML was everywhere. Might be worthwhile to adopt or adapt an existing scheme with improved / more cue amenable conventions.

On another note, might this be a good use of cue attributes to describe if a field is a name or an attribute? For languages with attribute support like C# they do something like that. Potentially have an additional json encoder/decoder that is xml aware and uses one of the standards above. if bidirectionality from xml->json, json->xml is important rather than just xml -> cue, cue -> xml, cue -> json.

mpvl commented 2 years ago

@ElliotSwart thanks for the useful references. Of course xml is one of the obvious missing adaptors in CUE.

The big reason why there is no xml adaptor yet is that there is a large design space, as you already alluded to, and that CUE is just different from others, making it a bit harder.

For instance, take XML attributes:

Then there are namespaces, escaping and all these other things to worry about. And how do DTDs fit in the picture? And all the other questions addressed earlier in this issue.

So overall, this design work has not been very high on our list given the other things under development. But open to design proposals of course.

myitcv commented 2 years ago

Noting some experience, @tmm1 has done some work in this space "importing" from XSD.

gedw99 commented 2 years ago

Maybe initially target the basic aspects of xml. Many people don’t need XSD or DTD aspects