antchfx / htmlquery

htmlquery is golang XPath package for HTML query.
https://github.com/antchfx/xpath
MIT License
723 stars 73 forks source link

Does it support "string" xpath function? #37

Closed Nillouise closed 3 years ago

Nillouise commented 3 years ago

I want to use htmlquery.Query(parse, "string(//div[@id='postlist'])") to extract the text content, but it return nil. It seem it does not support the "string" xpath function?

zhengchun commented 3 years ago

Yes, supports string , you can check xpath document.

string is a function that return string value, this means you can not call htmlquery.Query, htmlquery.Query is used for query node-set. you should use Evaluate method, like this:

expr, _ := xpath.Compile("count(//img)")
v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total count is %f", v)

another ways is use text() on your xpath query. like this htmlquery.Query(parse, "//div[@id='postlist']/text()")

Nillouise commented 3 years ago

Thank for you reply, now it work with:

            response, err := getRequest(link)
            if err != nil {
                fmt.Println("error ", err)
                return
            }
            parse, err := htmlquery.Parse(response.Body)
            if err != nil {
                fmt.Println("error ", err)
                return
            }
            expr, _ := xpath.Compile("string(//div[@id='postlist'])")
            b := expr.Evaluate(htmlquery.CreateXPathNavigator(parse)).(string)
                        fmt.Printf("total count is %s", b)

but it seem cannot process the utf8 character, the return string have messy code, but doesn't matter.