go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.
https://go-rod.github.io
MIT License
5.32k stars 347 forks source link

MustWaitStable 在特定网站会始终阻塞 #1085

Open zhaofenghao opened 3 months ago

zhaofenghao commented 3 months ago

Rod Version: v0.116.1

The code to demonstrate your question

func main() {
    u := launcher.New().Leakless(true).Headless(false).MustLaunch()
    browser := rod.New().NoDefaultDevice().ControlURL(u).MustConnect()
    defer browser.MustClose()
    page := browser.MustPage()
    page.MustNavigate("https://www.abcproxy.com/")
    var quality int = 90
    var p = proto.PageCaptureScreenshot{
        Format:  "png",
        Quality: &quality,
    }
    page.MustWaitStable()
    res, err := page.Screenshot(true, &p)
    if err == nil {
        fmt.Println("res:", len(res))
    }
}

What you got

当网站重定向时,MustWaitStable 会导致代码一直阻塞

What you expect to see

网页重定向到其他网址,MustWaitStable 仍然可以正常使用

What have you tried to solve the question

经过调试发现:导致阻塞的原因在于WaitRequestIdle

wait := p.EachEvent(func(sent *proto.NetworkRequestWillBeSent) {
    for _, t := range excludeTypes {
        if sent.Type == t {
            return
        }
    }

    if match(sent.Request.URL) {
        // Redirect will send multiple NetworkRequestWillBeSent events with the same RequestID,
        // we should filter them out.
        if _, has := waitList[sent.RequestID]; !has {
            waitList[sent.RequestID] = sent.Request.URL
            update(waitList)
            idleCounter.Add()
        }
    }
}, func(e *proto.NetworkLoadingFinished) {
    checkDone(e.RequestID)
}, func(e *proto.NetworkLoadingFailed) {
    checkDone(e.RequestID)
})

网站完全加载后waitList长度不为0

github-actions[bot] commented 3 months ago

Please fix the format of your markdown:

36 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```go"]
59 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"]

generated by check-issue

ysmood commented 3 months ago

如果你看 WaitStable 的源代码,它其实就是调用了 3 个其他函数 (这也是 rod 的设计哲学之一,很多高级函数不过是几个低级函数加了默认参数),其中之一是 WaitRequestIdle,你可以通过单独调用其中某几个来定制化等待,而不是全部:

p.WaitDOMStable(time.Second, 0)

p.WaitRequestIdle(d, nil,[]string{"https://infinite-requests.com/.*"}, nil)()

因为有些网站可能会不停发送心跳请求,WaitRequestIdle 支持你将某些 url 请求排除到等待列表外,如上面代码所示。你可以看它的注释,用法写的应该比较清楚。