antchfx / htmlquery

htmlquery is golang XPath package for HTML query.
https://github.com/antchfx/xpath
MIT License
738 stars 74 forks source link

`Find` returns an element when passed an invalid xpath #52

Closed iambodi closed 2 years ago

iambodi commented 2 years ago
package main

import (
    "fmt"
    "strings"

    "github.com/antchfx/htmlquery"
)

const html = `<html>
<body>
<div>
    <ul id="food">
        <li>avocado</li>
    </ul>
</div>
</body>
</html>`

const xpath = `//*[contains(@id,"food")]//*[contains(@id,"food")]//*[contains(text(),"avocado")]`

func main() {
    doc, err := htmlquery.Parse(strings.NewReader(html))
    if err != nil {
        panic(err)
    }

    list := htmlquery.Find(doc, xpath)

    node := list[0]
    fmt.Println(node.Data, node.FirstChild.Data, node.Parent.Data)
}

Hi,

I have an issue with the Find method. When passed an invalid xpath (confirmed by the HTML inspector of Chrome and Firefox), it returns a node.

On playground: https://go.dev/play/p/3Z6jtSNsYfx

Thank you !

zhengchun commented 2 years ago

the second //*[contains(@id,"food")] including current self node when searching matches node, the correct should only process child of matched node by the first //*[contains(@id,"food")], not self. That is why your xpath can return a matched node.

iambodi commented 2 years ago

Thank you for your quick response ! Ok, so just to be sure : this is a bug, right ? If so, do you think it could be fixed in the near future ? I took a look at the code but couldn’t find what’s causing this behavior.

zhengchun commented 2 years ago

checkout https://github.com/antchfx/xpath/commit/ba368a603dd4fbdce369eb76a2636e41444a929b to fix.

You can re-open it if not fix.