go-xmlpath / xmlpath

Strict subset of the XPath specification for the Go language.
http://gopkg.in/xmlpath.v2
Other
115 stars 37 forks source link

Iterate through nodes #4

Closed jamra closed 10 years ago

jamra commented 10 years ago

When I try to iterate through the root node obtained from ParseHTML, I cannot use any of the unexported properties such as the nodes array or the kind property. How can I find all of the text nodes?

I am trying to iterate through the hierarchy of nodes and filter out the text to be used later for searching.

niemeyer commented 10 years ago

You can find all text nodes with "//text()".

For example, this:

var html = `<html><body><div>a</div><div>b</div></body></html>`

func run() error {
    path, err := xmlpath.Compile("//text()")
    if err != nil {
        return err
    }
    root, err := xmlpath.ParseHTML(bytes.NewReader([]byte(html)))
    if err != nil {
        return err
    }
    iter := path.Iter(root)
    for iter.Next() {
        fmt.Println(iter.Node())
    }
    return nil
}

Outputs:

a
b
jamra commented 10 years ago

Thanks Gustavo.

jamra commented 10 years ago

Okay. I'm still getting some strange output like script nodes. I have no idea how to filter them out.

http://pastebin.com/FerreRkn