antchfx / xmlquery

xmlquery is Golang XPath package for XML query.
https://github.com/antchfx/xpath
MIT License
444 stars 89 forks source link

Missing AttributeNode prefix and namespace URI #108

Open fgateuil opened 1 year ago

fgateuil commented 1 year ago

Hi,

I'm trying to find attribute values within a XML document but the returned data seems erroneous.

Description

When I query an XML to get a specific node attribute with namespace (for instance //@xlink:href), the returned xmlquery.Node is missing the prefix and namespace URI.

Steps to reproduce

package main

import (
    "fmt"
    "strings"

    "github.com/antchfx/xmlquery"
)

func main() {
    xml := `<?xml version="1.0"?>
<root xmlns:xlink="http://www.w3.org/1999/xlink">
    <node xlink:href="http://www.github.com">Some text...</node>
</root>`

    root, _ := xmlquery.Parse(strings.NewReader(xml))
    node, _ := xmlquery.Query(root, "//@xlink:href")
    fmt.Println("NamespaceURI:", node.NamespaceURI)
    fmt.Println("Prefix:", node.Prefix)
    fmt.Println("Data:", node.Data)
}

Expected result

NamespaceURI: http://www.w3.org/1999/xlink
Prefix: xlink
Data: href

Actual result

NamespaceURI:
Prefix:
Data: href

Solution proposal

In github.com/antchfx/xmlquery/query.go#getCurrentNode:

func getCurrentNode(it *xpath.NodeIterator) *Node {
    n := it.Current().(*NodeNavigator)
    if n.NodeType() == xpath.AttributeNode {
        childNode := &Node{
            Type: TextNode,
            Data: n.Value(),
        }
        return &Node{
            Parent:       n.curr,
            Type:         AttributeNode,
            // START MODIFICATION
            NamespaceURI: n.NamespaceURL(),
            Prefix:       n.Prefix(),
            // END MODIFICATION
            Data:         n.LocalName(),
            FirstChild:   childNode,
            LastChild:    childNode,
        }
    }
    return n.curr
}

Additional information

If it appears that I just misused the library, what is the correct way to do please ? My main use case is as follows:

zhengchun commented 1 year ago

Missing to consider attribute nodes prefix and Namespace URL.

You can use the below code to find a parent node node and then iterate over all its attribute values.

    node, _ := xmlquery.Query(root, "//node[@xlink:href]")
    for _, attr := range node.Attr {
        fmt.Println("NamespaceURI:", attr.NamespaceURI)
        fmt.Println("Prefix:", attr.Name.Space)
        fmt.Println("Data:", attr.Name.Local)
    }
fgateuil commented 1 year ago

Missing to consider attribute nodes prefix and Namespace URL.

You can use the below code to find a parent node node and then iterate over all its attribute values.

  node, _ := xmlquery.Query(root, "//node[@xlink:href]")
  for _, attr := range node.Attr {
      fmt.Println("NamespaceURI:", attr.NamespaceURI)
      fmt.Println("Prefix:", attr.Name.Space)
      fmt.Println("Data:", attr.Name.Local)
  }

Well, why not but if I'm doing so, I must first parse the xpath "//node[@xlink:href]" to extract the namespace (xlink) and prefix (href), and then loop over all the attributes to find the ones that match. It's not really efficient.

Anyway, thanks for your help @zhengchun: much appreciated.