golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.34k stars 17.7k forks source link

encoding/xml: `,innerxml` makes light of namespaces. #14467

Open HinataYanagi opened 8 years ago

HinataYanagi commented 8 years ago

See the example. The result is obviously not what we wish. We have no way we can obtain and recover the context, namely the prefix-namespace correspondence, where the fragment is extracted. I found xml.go naively accumulating input stream in saved while ,innerxml is effective. The best solution is like the below:

buffer := new(bytes.Buffer)
encoder := xml.NewEncoder(buffer)

// ...

for {
  token, err := Token()
  // ...
  encoder.EncodeToken(token)
}

// ...

encoder.Flush()
saveXMLData = buffer.Bytes()

Namespace-aware ,innerxml is rather expensive, so we may as well have a new flag. Millions of extensible guys are eager for a politic decision.

bradfitz commented 8 years ago

See related bugs: #13400 and #14407

HinataYanagi commented 8 years ago

Namespaces are cumbersome though essential after all. encoding/xml seems to have fundamental defects.

iwdgo commented 6 years ago

Reading the namespace of a field can only be done at struct level using the XMLName field. The Unmarshal documentation specifies that

Unmarshal maps an XML element to a struct using the following rules. In the rules, the tag of a field refers to the value associated with the key 'xml' in the struct field's tag (see the example above).`

The storage of the namespace requires the XMLName field in the struct of the XML element. In your example, p is the field tag. So the following structure

/* Space is stored at struct level B and tag field p is read correctly */
    type B struct {
        XMLName xml.Name // To store the name space of the struct
        Content string `xml:"p"` // p tag is the field tag
    }
    type A struct {
        B B `xml:""`  // required to ensure that namespace is stored
    }

reads correctly <A xmlns:x="urn:"><x:B><p>Go does not spoil its namespace</p></x:B></A>

The absence of the namespace on the B tag means that no default namespace applies to the inner tags (https://www.w3.org/TR/xml-names/#defaulting) which looks unwanted in this case. Put it differently, the namespace of a field needs to be at the struct XML tag level to be readable.