golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.19k stars 17.37k forks source link

proposal: encoding/xml: reject ill-formed XML #68299

Open DemiMarie opened 4 days ago

DemiMarie commented 4 days ago

Proposal Details

encoding/xml has multiple problems:

  1. It does not check for XML well-formedness constraints: #68294, #68295
  2. It does not check for XML namespace constraints: #68296, #68297
  3. Its handling of XML namespaces is known to be buggy.

This proposal covers #68294, #68295, and #68296. These can all be fixed internally to encoding/xml, without changes to the API. However, there will be new API on xml.Decoder:

const (
    AllowLeadingColons = 1 << iota
    AllowTrailingColons
    AllowDuplicateAttributes
)

/*
    Sets whether the parser allows ill-formed XML.
    Prior to 1.22, the parser always allowed ill-formed XML.
    Starting in 1.23, ill-formed XML is not allowed by default,
    but it can be re-enabled by calling decoder.AllowIllFormed(-1).
    The Allow* flags can be used for more fine-grained control.
*/
func (d *xml.Decoder) AllowIllFormed(flags int64)

and a GODEBUG flag allow-ill-formed-xml=<bitmask> for course-grained global control.

In the future, xml.Decoder will reject ill-formed XML. If it is found to accept ill-formed XML, this will be considered a bug and fixed, with a new flag so that applications can opt-in to the old behavior.

Debug flags for encoding/xml may be removed with not less than two major versions notice.

My understanding is that this change is enough to guarantee round-trip stability for RawToken users (not for Token users). I believe that is possible to implement a namespace-aware, round-trip-stable parser correctly on top of the RawToken API, but the standard library does not currently implement such a parser.

This is essentially “merge #48641 + debug flags”.

gabyhelp commented 4 days ago

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)