antchfx / htmlquery

htmlquery is golang XPath package for HTML query.
https://github.com/antchfx/xpath
MIT License
739 stars 74 forks source link

concurrent writes to map panic #58

Closed JWAlberty closed 1 year ago

JWAlberty commented 1 year ago

it appears positmap in descendantQuery somehow gets reused. I have not been able to find any global state that is the cause but I did manage to make a (unreliable) test case. If I set it to 100 goroutines it pretty reliably triggers it for me, though i've gotten it to trigger with as few as 20 goroutines. It only seems to happen with xpath 1.2.2 but going to htmlquery 1.2.5 seems to make it less likely to happen (though 1.2.5 htmlquery seems to cause hangs). I have never been able to reproduce it with xpath 1.2.1. Attaching testcase with 100 goroutines and a stack trace from a run with 30 (for brevity) Test case:

package main
import (
    "fmt"
    "strings"
    "sync"
    "github.com/antchfx/htmlquery"
    "golang.org/x/net/html"
)
func main() {
    wg := new(sync.WaitGroup)
    for i := 0; i <= 100; i++ {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            fmt.Printf("%d: %#v\n", i, getDiv(`//div`))
        }(i)
    }
    wg.Wait()
}
func getDiv(selector string) []*html.Node {
    s := `<html><head></head><body><div>a</div></body>`
    doc, err := htmlquery.Parse(strings.NewReader(s))
    if err != nil {
        panic(err)
    }
    return htmlquery.Find(doc, selector)
}

stack trace:

fatal error: concurrent map writes
12: []*html.Node{(*html.Node)(0x140002816c0)}
22: []*html.Node{(*html.Node)(0x140003941c0)}
1: []*html.Node{(*html.Node)(0x1400031c700)}
11: []*html.Node{(*html.Node)(0x1400041c1c0)}
goroutine 47 [running]:
[github.com/antchfx/xpath.(*descendantQuery).Select.func1()](http://github.com/antchfx/xpath.(*descendantQuery).Select.func1())
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/xpath@v1.2.2/query.go:267 +0xe8
[github.com/antchfx/xpath.(*descendantQuery).Select(0x1400013a240](http://github.com/antchfx/xpath.(*descendantQuery).Select(0x1400013a240), {0x104739ea8, 0x1400008e120})
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/xpath@v1.2.2/query.go:289 +0x160
[github.com/antchfx/xpath.(*NodeIterator).MoveNext(0x1400008e120)](http://github.com/antchfx/xpath.(*NodeIterator).MoveNext(0x1400008e120))
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/xpath@v1.2.2/xpath.go:86 +0x3c
[github.com/antchfx/htmlquery.QuerySelectorAll(0x140000947e0](http://github.com/antchfx/htmlquery.QuerySelectorAll(0x140000947e0), 0x140001581c0)
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/htmlquery@v1.2.5/query.go:83 +0x120
[github.com/antchfx/htmlquery.QueryAll(0x10473a188](http://github.com/antchfx/htmlquery.QueryAll(0x10473a188)?, {0x10468e778?, 0x0?})
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/htmlquery@v1.2.5/query.go:54 +0x5c
[github.com/antchfx/htmlquery.Find(...)](http://github.com/antchfx/htmlquery.Find(...))
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/htmlquery@v1.2.5/query.go:30
main.getDiv({0x10468e778, 0x5})
    /tmp/xpath/main.go:30 +0x74
main.main.func1(0x0?)
    /tmp/xpath/main.go:18 +0x60
created by main.main
    /tmp/xpath/main.go:16 +0x3c
goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x140000021a0?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/runtime/sema.go:62 +0x28
sync.(*WaitGroup).Wait(0x1400012e2c0)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/sync/waitgroup.go:139 +0x80
main.main()
    /tmp/xpath/main.go:21 +0xf0
goroutine 19 [runnable]:
internal/poll.runtime_Semrelease(0x2d?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/runtime/sema.go:82 +0x24
internal/poll.(*fdMutex).rwunlock(0x14000317d78?, 0x28?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_mutex.go:188 +0xcc
internal/poll.(*FD).writeUnlock(0x1400012c060)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_mutex.go:249 +0x28
internal/poll.(*FD).Write(0x1400012c060, {0x1400032c000, 0x2d, 0x40})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_unix.go:388 +0x368
os.(*File).write(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file_posix.go:48
os.(*File).Write(0x1400012a008, {0x1400032c000?, 0x2d, 0x14000317f80?})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file.go:176 +0x60
fmt.Fprintf({0x10473a0e8, 0x1400012a008}, {0x1046911f2, 0x8}, {0x14000317f80, 0x2, 0x2})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:205 +0x84
fmt.Printf(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:213
main.main.func1(0x0?)
    /tmp/xpath/main.go:18 +0xdc
created by main.main
    /tmp/xpath/main.go:16 +0x3c
goroutine 30 [runnable]:
syscall.syscall(0x14000416d78?, 0x1045c1c28?, 0x80000000000?, 0x7ffff80000000000?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/runtime/sys_darwin.go:23 +0x54
syscall.write(0x1400012c060?, {0x1400042c000?, 0x14000416e98?, 0x1045c7844?})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/syscall/zsyscall_darwin_arm64.go:1653 +0x48
syscall.Write(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/syscall/syscall_unix.go:211
internal/poll.ignoringEINTRIO(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_unix.go:794
internal/poll.(*FD).Write(0x1400012c060?, {0x1400042c000?, 0x2e?, 0x40?})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_unix.go:383 +0x2ac
os.(*File).write(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file_posix.go:48
os.(*File).Write(0x1400012a008, {0x1400042c000?, 0x2e, 0x14000416f80?})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file.go:176 +0x60
fmt.Fprintf({0x10473a0e8, 0x1400012a008}, {0x1046911f2, 0x8}, {0x14000416f80, 0x2, 0x2})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:205 +0x84
fmt.Printf(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:213
main.main.func1(0x0?)
    /tmp/xpath/main.go:18 +0xdc
created by main.main
    /tmp/xpath/main.go:16 +0x3c
goroutine 35 [runnable]:
internal/poll.runtime_Semacquire(0x14000205d78?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/runtime/sema.go:67 +0x28
internal/poll.(*fdMutex).rwlock(0x1400012c060, 0xa8?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_mutex.go:154 +0xe0
internal/poll.(*FD).writeLock(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_mutex.go:239
internal/poll.(*FD).Write(0x1400012c060, {0x14000132140, 0x2e, 0x40})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_unix.go:370 +0x48
os.(*File).write(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file_posix.go:48
os.(*File).Write(0x1400012a008, {0x14000132140?, 0x2e, 0x14000205f80?})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file.go:176 +0x60
fmt.Fprintf({0x10473a0e8, 0x1400012a008}, {0x1046911f2, 0x8}, {0x14000205f80, 0x2, 0x2})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:205 +0x84
fmt.Printf(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:213
main.main.func1(0x0?)
    /tmp/xpath/main.go:18 +0xdc
created by main.main
    /tmp/xpath/main.go:16 +0x3c
goroutine 40 [semacquire]:
internal/poll.runtime_Semacquire(0x14000204d78?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/runtime/sema.go:67 +0x28
internal/poll.(*fdMutex).rwlock(0x1400012c060, 0xa8?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_mutex.go:154 +0xe0
internal/poll.(*FD).writeLock(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_mutex.go:239
internal/poll.(*FD).Write(0x1400012c060, {0x1400028a000, 0x2e, 0x80})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_unix.go:370 +0x48
os.(*File).write(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file_posix.go:48
os.(*File).Write(0x1400012a008, {0x1400028a000?, 0x2e, 0x14000204f80?})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file.go:176 +0x60
fmt.Fprintf({0x10473a0e8, 0x1400012a008}, {0x1046911f2, 0x8}, {0x14000204f80, 0x2, 0x2})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:205 +0x84
fmt.Printf(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:213
main.main.func1(0x0?)
    /tmp/xpath/main.go:18 +0xdc
created by main.main
    /tmp/xpath/main.go:16 +0x3c
goroutine 46 [runnable]:
[github.com/antchfx/htmlquery.(*NodeNavigator).MoveToRoot(0x14000124a20?)](http://github.com/antchfx/htmlquery.(*NodeNavigator).MoveToRoot(0x14000124a20?))
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/htmlquery@v1.2.5/query.go:263 +0x54
[github.com/antchfx/xpath.(*contextQuery).Select(0x1400012e2d0](http://github.com/antchfx/xpath.(*contextQuery).Select(0x1400012e2d0), {0x104739ea8?, 0x14000158500?})
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/xpath@v1.2.2/query.go:47 +0x78
[github.com/antchfx/xpath.(*descendantQuery).Select(0x1400013a240](http://github.com/antchfx/xpath.(*descendantQuery).Select(0x1400013a240), {0x104739ea8, 0x14000158500})
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/xpath@v1.2.2/query.go:246 +0x54
[github.com/antchfx/xpath.(*NodeIterator).MoveNext(0x14000158500)](http://github.com/antchfx/xpath.(*NodeIterator).MoveNext(0x14000158500))
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/xpath@v1.2.2/xpath.go:86 +0x3c
[github.com/antchfx/htmlquery.QuerySelectorAll(0x140001d2fc0](http://github.com/antchfx/htmlquery.QuerySelectorAll(0x140001d2fc0), 0x140001581c0)
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/htmlquery@v1.2.5/query.go:83 +0x120
[github.com/antchfx/htmlquery.QueryAll(0x10473a188](http://github.com/antchfx/htmlquery.QueryAll(0x10473a188)?, {0x10468e778?, 0x0?})
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/htmlquery@v1.2.5/query.go:54 +0x5c
[github.com/antchfx/htmlquery.Find(...)](http://github.com/antchfx/htmlquery.Find(...))
    /Users/jacobalberty/go/pkg/mod/github.com/antchfx/htmlquery@v1.2.5/query.go:30
main.getDiv({0x10468e778, 0x5})
    /tmp/xpath/main.go:30 +0x74
main.main.func1(0x0?)
    /tmp/xpath/main.go:18 +0x60
created by main.main
    /tmp/xpath/main.go:16 +0x3c
goroutine 48 [runnable]:
internal/poll.runtime_Semacquire(0x14000209d78?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/runtime/sema.go:67 +0x28
internal/poll.(*fdMutex).rwlock(0x1400012c060, 0xa8?)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_mutex.go:154 +0xe0
internal/poll.(*FD).writeLock(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_mutex.go:239
internal/poll.(*FD).Write(0x1400012c060, {0x14000132180, 0x2e, 0x40})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/internal/poll/fd_unix.go:370 +0x48
os.(*File).write(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file_posix.go:48
os.(*File).Write(0x1400012a008, {0x14000132180?, 0x2e, 0x14000209f80?})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/os/file.go:176 +0x60
fmt.Fprintf({0x10473a0e8, 0x1400012a008}, {0x1046911f2, 0x8}, {0x14000209f80, 0x2, 0x2})
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:205 +0x84
fmt.Printf(...)
    /opt/homebrew/Cellar/go/1.19.5/libexec/src/fmt/print.go:213
main.main.func1(0x0?)
    /tmp/xpath/main.go:18 +0xdc
created by main.main
    /tmp/xpath/main.go:16 +0x3c
exit status 2
zhengchun commented 1 year ago

In xpath v1.2.2, Expr object will be allows reused and replace re-create new object if it cached by the htmlquery. https://github.com/antchfx/xpath/commit/dfece7e444f00ae3aea4086a84ba9afdc0b93783

You can disable htmlquery caching temporarily via htmlquery.DisableSelectorCache = true to sove it and waiting fix.

zhengchun commented 1 year ago

update htmlquery to the new version https://github.com/antchfx/htmlquery/releases/tag/v1.3.0