golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.14k stars 17.46k forks source link

proposal: encoding/xml: Collect metadata, like order and line numbers when parsing XML #67038

Open cague opened 3 months ago

cague commented 3 months ago

Proposal Details

When parsing / Unmarshalling an XML Elements, collect order and line numbers.

Use field tags, like below:

type Vehicle struct { Make string xml:"make" Model string xml:"model" Wheelbase float32 xml:"wheelbase" ProblemsXML }

type ProblemsXML struct { XMLName Name UnkElems []XMLElement xml:",any" UnkAttrs []Attr xml:",any,attr" Ooois OutOfOrderItems xml:",ooorder" Order int xml:",order" Line int xml:",line" }

Example: line 1 line 2 Chevrolet line 3 107.2 line 4 Corvette line 5 line 6 line 7 Chevrolet line 8 107.2 line 9 Corvette line 19

Vehicle[0].Order == 1 Vehicle[0].Line == 1 Vehicle[1].Order == 2 Vehicle[1].Line == 6

I have the code, not very complicated. Is the next step to wait for approval before a pull request?

ianlancetaylor commented 3 months ago

Please describe the new API you are suggesting. Thanks.

cague commented 3 months ago

There isn't a new API in the sense that there are new functions. There are new "field tags" for structures.

xml:",order" // Stores the order of the Element within the Element's parent Element xml:",line" // Stores the line number for the start of the Element

So when Go's regular Unmarshal function is used, the Go code will fill in those values. See ProblemsXML structure in the other comments to see the field tags in use.

ianlancetaylor commented 3 months ago

OK, can you write out the new documentation that would be added to the encoding/xml package? Thanks.

cague commented 3 months ago

This is the current doc here for Unmarshal https://pkg.go.dev/encoding/xml#Unmarshal There is a list of bullet points describing struct field tags. Here's one of the existing bullet points:

The new doc would add more bullet points:

And then maybe also have a field tag for catching "out of order" elements.

type OutOfOrderItems []OutOfOrderItem

type OutOfOrderItem struct { ElementOOO string // Out of order child element that appears after ElementMarker in the parsed XML ElementMarker string // The element that appears before ElementOOO in the parsed XML but is defined after ElementOOO in the structure that Unmarshal stores the parsed results }

This is similar to XML schema element and can be used to help enforce order. e.g. msg = fmt.Sprintf("out of order: Element \"%v\" must be before Element \"%v\"", oooi.ElementOOO , oooi.ElementMarker)

ianlancetaylor commented 3 months ago

Thanks.