antchfx / htmlquery

htmlquery is golang XPath package for HTML query.
https://github.com/antchfx/xpath
MIT License
723 stars 73 forks source link

`replace()` on a query doesn't seem to work #51

Closed alecthomas closed 5 months ago

alecthomas commented 2 years ago
package main

import (
    "fmt"
    "strings"

    "github.com/antchfx/htmlquery"
)

func main() {
    s := `<html><a href="https://github.com/cashapp/hermit-build/releases/download/go-tools/stringer-v0.1.12-darwin-amd64.bz2">foo</a></html>`
    doc, err := htmlquery.Parse(strings.NewReader(s))
    if err != nil {
        panic(err)
    }
    nodes, err := htmlquery.QueryAll(doc, `replace((//a[contains(@href, '/stringer-')])/@href, '^.*/stringer-v([^-]*)-.*$', '$1')`)
    if err != nil {
        panic(err)
    }
    for _, node := range nodes {
        fmt.Println(htmlquery.OutputHTML(node, false))
    }
}

On playground: https://go.dev/play/p/jxU6UgH0DnK The same content+query works fine on https://www.freeformatter.com/xpath-tester.html

The above example without replace() works fine: https://go.dev/play/p/N22KULbkgRu

zhengchun commented 2 years ago

There are two problem.

  1. replace() not supportd the regex syntax.
  2. replace() as a function and return a value with string type, you should call xpath.Evaluate(...), the htmlquery.QueryAll() always return a set of node.
expr, err := xpath.Compile(`replace((//a[contains(@href, '/stringer-')])/@href, '^.*/stringer-v([^-]*)-.*$', '$1')`)
if err != nil {
    panic(err)
}
v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc))
fmt.Println(v.(string))
alecthomas commented 2 years ago

Thank you for the pointers, I've switched to using substring-after and substring-before. BTW it looks like replace should support regex?

alecthomas commented 2 years ago

Thanks for a great set of libraries BTW, really nicely done.