golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.31k stars 17.7k forks source link

proposal: encoding/xml: Add EmptyElement token type to support self-closing elements. #69273

Open nemith opened 2 months ago

nemith commented 2 months ago

Proposal Details

Background

The current implementation has no concept of self-closing elements and if a empty value (like an empty string) is provided then a start element followed by a end element is given like <foo></foo>

Unfortunately there are XML implementations out there that depend on self closing elements. These are usually closed source and probably doing some hand parsing of certain elements. These implementations are broken but it is what we have.

For my personal issue Juniper Networks has a XML API to their routers. This is all closed source and they expect certain elements to denote configuration targets such as the running config or the startup config. These probably should not be elements at all but they only accept <running/> or <startup/> as valid config targets. Sending <running></running> results in a parsing error using their API.

There may be other reasons to support self-closing elements but it is unknown to me.

There have been a number of requests to support self-closing elements in the encoding/xml package.

And on external repos

Proposal

This expands on some of the discussions in #21399 bu @SamWhited, @Merovius and others that there needed to be a way to encode with Encoder.EncodeToken() (at very least).

This would add a new token type called EmptyElement (not in love with the name and we can bikeshed it needed) which would be essentially a clone of StartElement

// EmptyElement repsents a self-closing element (i.e <my-element/>).  This
// element type is only used during encoding and is never emitted when decoding.
type EmptyElement struct {
    Name Name
    Attr []Attr
}

This element would be encoded by Encoder.EncodeToken() and would emit a self closing tag for the given Name and optional attributes.

enc := xml.NewEncoder(w)
_ := enc.EncodeToken(xml.EmptyElement{Name: xml.Name{Local: "foo"}}; 

would emit <foo/> to the writer.

Unlike StartElement no error would be given if a EndElement is not found as it already self-closing.

This new token type would only be used for encoding. Decoding would continue to use the same logic and would only produce StartElement and EndElement types even if a self-closing tag is found to be fully backwards compatible. Documenation will state this.

Given this new type custom types using xml.Marshaler can be created to emit self-closing elements. In the future a struct tag to make this easier for encoding structs could be investigated, but given the scope of this issue, probably isn't warranted.

Alternatives

Automatically convert "empty" elements to self-closing (i.e: <foo></foo> -> <foo/>)

This was my original suggestion back in 2013 in #6896. As pointed out this would be a breaking change for the encoding itself and it is not feasible with the existing package.

allowempty struct tag or similar as proposed in #21399

This was implemented in https://go-review.googlesource.com/c/go/+/59830 and inspired this change. However there is no way to use the lower level Encoder.EncodeElement() or Encoder.EncodeToken() methods with it.

This proposal would lay the ground work for basic support and then something like a struct tag could be added/layered on top later if needed.

Extend StartElement struct

StartElement could be extended to have a Closed bool field to be self closing.

i.e:

type StartElement struct {
    Name  Name
    Attr  []Attr
    SelfClose bool   // ADDED
}

This would allow for encoder.EncodeToken() to produce the right tag. However this may be confusing when used with Encoder.EncodeElement() which specifically requires a StartElement token to be passed in along with a value. Today if a nil value is passed to Encoder.EncodeElement() then no xml element is produced.

One could imagine revising this so that if nil value and SelfClose is set then you would produce a self-closing tag. You would also want to emit an error if a non-nil (or non-empty?) value is passed and SelfClose is set to true.

However I believe this makes for a more confusing API given the current EncodeElement() function signature as well as the MarshalXML() method on the Marshaller interface.

A RawXML token as proposed in #26756

Having a RawXML token as proposed in #26756 could allow for users to compose and emit their own self-closing tags. This could replace and/or augment this proposal and could be acceptable.

Wait for a revised encoding/xml sweep and possibly a overhall (encoding/xml/v2?)

There are a number of other shortcomings in the xml package. It may be worth it to hold off on any changes to the existing API to either move to a new package. This package could be replace the one in the standard library or perhaps even be created outside of the stdlib with the existing encoding/xml being placed on a official freeze similar to packages like net/smtp as the demand for XML isn't as big as JSON and other encodings.

gabyhelp commented 2 months ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

gopherbot commented 2 months ago

Change https://go.dev/cl/469495 mentions this issue: encoding/xml: Add EmptyElement token.

ydnar commented 2 months ago

We implemented this in a branch of encoding/xml here: https://github.com/nbio/xml/commit/088fa821cad9c29ee95341674442b664f964fc63

SamWhited commented 2 months ago

Thinking out loud:

I like this solution, but I worry about it being write only. I've been thinking about how I would do the XML package differently if I were to write my own and the main thing I think I'd like to see is the ability to round trip XML through an encoder and a decoder and have it be stable (right now we end up adding tons of extra illegal xmlns attributes, the encoding of namespaced attributes changes, etc.). If we're not implementing this for the decoder side we can't be stable, and if we did implement it for the decoder side I suspect we'd break a lot of code, so we'd want the ability to select an XML package version first, I think.

If we're going to break round tripping anyways I think I personally prefer your idea in #6896 to automatically make the tag self-closing and I think it can be done without breaking compatibility by making this an option on the encoder:

type Encoder struct {
  // EmitSelfClosingTags causes empty string struct tags without omitempty or StartElement tokens immediately
  // followed by the corresponding EndElement tag to 
  EmitSelfClosingTags bool
}

Alternatively, we could implement this proposal for both reading and writing tokens with the new token type you proposed, but to prevent breaking others code we could have an option on the Decoder along the lines of the one above:

type Decoder struct{
  // If PreserveSelfClosing is set, self closing tags (<foo/>) are emitted as EmptyElement tokens.
  // Otherwise they are emitted as a StartElement immediately followed by its EndElement.
  PreserveSelfClosing bool
}

All of this is really a work around for not having a good way to version the XML package though; I've stopped following Go more or less, but I know we were talking about being able to pin standard library package versions a while back. I don't know what the status of it is. If that happens ensuring compatibility would be a lot easier.

nemith commented 2 months ago

I like this solution, but I worry about it being write only...I think I'd like to see is the ability to round trip XML through an encoder and a decoder and have it be stable

Like you said this already not possible. I think it's a good idea, but incompatible with the current package and even smallest of changes to encoding/xml in the past have created some breakage. If the goal is to add to the existing API then I think we cannot push for this.

If we're going to break round tripping anyways I think I personally prefer your idea in https://github.com/golang/go/issues/6896 to automatically make the tag self-closing and I think it can be done without breaking compatibility by making this an option on the encoder.

I am open to this idea, but I also like the idea of having more direct control of the tokens that are emitted at a low level as well.

I know we were talking about being able to pin standard library package versions a while back

There is now one example of v2 version of stdlib packages: math/rand/v2 which now exists along side the original math/rand and brings some backwards compatability.

Perhaps a better example is the efforts on revamping the encoding/json package into encoding/json/v2 which is a massive multi-year effort to see what a new JSON encoding package would look like.

There was also talks about a xml sweep to look through all the shortcomings of the encoding/xml package (and there are way more than just self-closing elements) and I think it could be beneficial to explore what a new xml package could look like.

So I don't think it's out of the question, but it would also be a massive undertaking and XML just doesn't have the same number of consumers and use cases that JSON has which means the effort may not be worth the payout?

SamWhited commented 2 months ago

XML just doesn't have the same number of consumers and use cases that JSON has

I'm not so sure that this is true (I'm not sure that it's not true either, TBF). I think a lot of folks in tech tend to think of programmers as all working for silicone valley web companies or trendy small open source projects, but I'm not sure that this is true (even among consumers of Go). Again, I don't have any evidence that it's not true either, this statement just doesn't seem obviously true or false to me.

That being said, I was thinking of something more granular than a fully revamped v2 package, but if we can convince the Go team to move towards a v2 package I do think that would be beneficial. Even for web stuff, XML is a pretty foundational technology and it would be nice to have better support for it.