Open nemith opened 2 months ago
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Change https://go.dev/cl/469495 mentions this issue: encoding/xml: Add EmptyElement token.
We implemented this in a branch of encoding/xml
here: https://github.com/nbio/xml/commit/088fa821cad9c29ee95341674442b664f964fc63
Thinking out loud:
I like this solution, but I worry about it being write only. I've been thinking about how I would do the XML package differently if I were to write my own and the main thing I think I'd like to see is the ability to round trip XML through an encoder and a decoder and have it be stable (right now we end up adding tons of extra illegal xmlns attributes, the encoding of namespaced attributes changes, etc.). If we're not implementing this for the decoder side we can't be stable, and if we did implement it for the decoder side I suspect we'd break a lot of code, so we'd want the ability to select an XML package version first, I think.
If we're going to break round tripping anyways I think I personally prefer your idea in #6896 to automatically make the tag self-closing and I think it can be done without breaking compatibility by making this an option on the encoder:
type Encoder struct {
// EmitSelfClosingTags causes empty string struct tags without omitempty or StartElement tokens immediately
// followed by the corresponding EndElement tag to
EmitSelfClosingTags bool
}
Alternatively, we could implement this proposal for both reading and writing tokens with the new token type you proposed, but to prevent breaking others code we could have an option on the Decoder
along the lines of the one above:
type Decoder struct{
// If PreserveSelfClosing is set, self closing tags (<foo/>) are emitted as EmptyElement tokens.
// Otherwise they are emitted as a StartElement immediately followed by its EndElement.
PreserveSelfClosing bool
}
All of this is really a work around for not having a good way to version the XML package though; I've stopped following Go more or less, but I know we were talking about being able to pin standard library package versions a while back. I don't know what the status of it is. If that happens ensuring compatibility would be a lot easier.
I like this solution, but I worry about it being write only...I think I'd like to see is the ability to round trip XML through an encoder and a decoder and have it be stable
Like you said this already not possible. I think it's a good idea, but incompatible with the current package and even smallest of changes to encoding/xml in the past have created some breakage. If the goal is to add to the existing API then I think we cannot push for this.
If we're going to break round tripping anyways I think I personally prefer your idea in https://github.com/golang/go/issues/6896 to automatically make the tag self-closing and I think it can be done without breaking compatibility by making this an option on the encoder.
I am open to this idea, but I also like the idea of having more direct control of the tokens that are emitted at a low level as well.
I know we were talking about being able to pin standard library package versions a while back
There is now one example of v2
version of stdlib packages: math/rand/v2
which now exists along side the original math/rand
and brings some backwards compatability.
Perhaps a better example is the efforts on revamping the encoding/json
package into encoding/json/v2
which is a massive multi-year effort to see what a new JSON encoding package would look like.
There was also talks about a xml sweep to look through all the shortcomings of the encoding/xml
package (and there are way more than just self-closing elements) and I think it could be beneficial to explore what a new xml package could look like.
So I don't think it's out of the question, but it would also be a massive undertaking and XML just doesn't have the same number of consumers and use cases that JSON has which means the effort may not be worth the payout?
XML just doesn't have the same number of consumers and use cases that JSON has
I'm not so sure that this is true (I'm not sure that it's not true either, TBF). I think a lot of folks in tech tend to think of programmers as all working for silicone valley web companies or trendy small open source projects, but I'm not sure that this is true (even among consumers of Go). Again, I don't have any evidence that it's not true either, this statement just doesn't seem obviously true or false to me.
That being said, I was thinking of something more granular than a fully revamped v2 package, but if we can convince the Go team to move towards a v2 package I do think that would be beneficial. Even for web stuff, XML is a pretty foundational technology and it would be nice to have better support for it.
Proposal Details
Background
The current implementation has no concept of self-closing elements and if a empty value (like an empty string) is provided then a start element followed by a end element is given like
<foo></foo>
Unfortunately there are XML implementations out there that depend on self closing elements. These are usually closed source and probably doing some hand parsing of certain elements. These implementations are broken but it is what we have.
For my personal issue Juniper Networks has a XML API to their routers. This is all closed source and they expect certain elements to denote configuration targets such as the running config or the startup config. These probably should not be elements at all but they only accept
<running/>
or<startup/>
as valid config targets. Sending<running></running>
results in a parsing error using their API.There may be other reasons to support self-closing elements but it is unknown to me.
There have been a number of requests to support self-closing elements in the
encoding/xml
package.And on external repos
Proposal
This expands on some of the discussions in #21399 bu @SamWhited, @Merovius and others that there needed to be a way to encode with
Encoder.EncodeToken()
(at very least).This would add a new token type called
EmptyElement
(not in love with the name and we can bikeshed it needed) which would be essentially a clone ofStartElement
This element would be encoded by
Encoder.EncodeToken()
and would emit a self closing tag for the given Name and optional attributes.would emit
<foo/>
to the writer.Unlike
StartElement
no error would be given if aEndElement
is not found as it already self-closing.This new token type would only be used for encoding. Decoding would continue to use the same logic and would only produce
StartElement
andEndElement
types even if a self-closing tag is found to be fully backwards compatible. Documenation will state this.Given this new type custom types using
xml.Marshaler
can be created to emit self-closing elements. In the future a struct tag to make this easier for encoding structs could be investigated, but given the scope of this issue, probably isn't warranted.Alternatives
Automatically convert "empty" elements to self-closing (i.e:
<foo></foo>
-><foo/>
)This was my original suggestion back in 2013 in #6896. As pointed out this would be a breaking change for the encoding itself and it is not feasible with the existing package.
allowempty
struct tag or similar as proposed in #21399This was implemented in https://go-review.googlesource.com/c/go/+/59830 and inspired this change. However there is no way to use the lower level
Encoder.EncodeElement()
orEncoder.EncodeToken()
methods with it.This proposal would lay the ground work for basic support and then something like a struct tag could be added/layered on top later if needed.
Extend
StartElement
structStartElement
could be extended to have aClosed bool
field to be self closing.i.e:
This would allow for
encoder.EncodeToken()
to produce the right tag. However this may be confusing when used withEncoder.EncodeElement()
which specifically requires aStartElement
token to be passed in along with a value. Today if anil
value is passed toEncoder.EncodeElement()
then no xml element is produced.One could imagine revising this so that if
nil
value andSelfClose
is set then you would produce aself-closing
tag. You would also want to emit an error if a non-nil (or non-empty?) value is passed andSelfClose
is set to true.However I believe this makes for a more confusing API given the current
EncodeElement()
function signature as well as theMarshalXML()
method on theMarshaller
interface.A RawXML token as proposed in #26756
Having a
RawXML
token as proposed in #26756 could allow for users to compose and emit their own self-closing tags. This could replace and/or augment this proposal and could be acceptable.Wait for a revised
encoding/xml
sweep and possibly a overhall (encoding/xml/v2
?)There are a number of other shortcomings in the xml package. It may be worth it to hold off on any changes to the existing API to either move to a new package. This package could be replace the one in the standard library or perhaps even be created outside of the stdlib with the existing
encoding/xml
being placed on a official freeze similar to packages likenet/smtp
as the demand for XML isn't as big asJSON
and other encodings.