golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.22k stars 17.37k forks source link

encoding/xml: brittle support for matching a namespace by identifier or url #12624

Open pkieltyka opened 8 years ago

pkieltyka commented 8 years ago

The issue is that I believe a struct tag's namespace should be matchable by the xmlns identifier or url.

To shed some light on the issue, consider a RSS feed parser thats deals with namespaces from a variety of definitions. I could expect a few different kinds of xmlns definitions for the same type of structure. ie. consider mRSS feeds in the wild that use the "media" namespace, you will find:

  1. Xmlns wasn't defined, but the namespace was used (ie. for mRSS with media namespace)
  2. Xmlns was defined as xmlns:media="http://search.yahoo.com/mrss/"
  3. Xmlns was defined as xmlns:media="http://search.yahoo.com/mrss"

I noticed that encoding/xml would track the xmlns' in a map to the url, and would match the struct tags to the url. The issue of course here is with 2 and 3, where the difference between a "/" would throw off the parser.

I wrote a fix (including tests) using Go 1.5.1's encoding/xml code: https://github.com/pkieltyka/xml/commit/7ad1fab466ec10f0fe7e47a36050b1956ac8bedb

Consider a partial parser for the media rss module:

type Media struct {
  Title Title `xml:"media title"`
  Description Description `xml:"media description"`
  Thumbnails []Thumbnail `xml:"media thumbnail"`
  Contents []Content `xml:"media content"`
  MediaGroups []Group `xml:"media group"`
}

Notice the using the namespace prefix in the struct tag instead of the ns url. But, if xmlns:media="URL" was defined in the original document, the parser would expect to match it by the URL, but IMO, it should check both the prefix and url of the namespace. I'm reporting this issue and will submit the fix separately, thanks for the consideration.

pkieltyka commented 8 years ago

and, https://go-review.googlesource.com/14601

gopherbot commented 8 years ago

CL https://golang.org/cl/14601 mentions this issue.

rakyll commented 8 years ago

We need a consortium of ideas in general around the namespace support for the xml package before considering this change.

See https://github.com/golang/go/issues/11496, https://github.com/golang/go/issues/11496 and https://github.com/golang/go/issues/6800.

1.5 cycle unnecessary broke the existing behavior of the package for many cases and the changes that have gone through to address the namespacing bugs had to be reverted, see https://github.com/golang/go/issues/11841.

iwdgo commented 6 years ago

This proposal is not in line with the namespace XML standard (https://www.w3.org/TR/xml-names/#NSNameComparison) which explicitely states that URI are treated as strings and must be exactly identical, i.e. without escaping or any other manipulation.

iwdgo commented 5 days ago

https://go.dev/play/p/zBgGuTzbMoe?v=gotip contains the test provided.