go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.
https://go-rod.github.io
MIT License
5.42k stars 356 forks source link

Panic on .HTML() #1081

Open bazuker opened 4 months ago

bazuker commented 4 months ago

Rod Version: v0.116.1

The following code panics when frame.HTML() is called. frame is confirmed not to be nil.

I can provide full iframe HTML code if necessary.

The code to demonstrate your question

    hasVerify, cloudflareIframe, err := page.Has("iframe[src*='https://challenges.cloudflare.com']")
    if err == nil && hasVerify {
        log.Println("human verification detected")
        cloudflareIframe.MustWaitStable()
        log.Println("trying to pass")
        cf, err := page.Element("iframe")
        if err != nil {
            return nil, fmt.Errorf("failed to get cloudflare iframe: %w", err)
        }
        log.Println("got iframe")
        frame, err := cf.Frame()
        if err != nil {
            return nil, fmt.Errorf("failed to unwrap cloudflare frame: %w", err)
        }
        log.Println("targeted", frame)
        fmt.Println(frame.HTML()) // <---- PANICS HERE
    }

Log and stack trace

2024/06/26 21:38:49 human verification detected
2024/06/26 21:38:50 trying to pass
2024/06/26 21:38:50 got iframe
2024/06/26 21:38:50 targeted <page:6B33B27E>

panic recovered:
runtime error: invalid memory address or nil pointer dereference
/usr/local/go/src/runtime/panic.go:261 (0x102701377)
        panicmem: panic(memoryError)
/usr/local/go/src/runtime/signal_unix.go:881 (0x102701344)
        sigpanic: panicmem()
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:350 (0x1029deaac)
        (*Page).getJSCtxID: obj, err := proto.DOMResolveNode{BackendNodeID: node.ContentDocument.BackendNodeID}.Call(p)
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:249 (0x1029de0bb)
        (*Page).ensureJSHelper: jsCtxID, err := p.getJSCtxID()
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:234 (0x1029ddeeb)
        (*Page).formatArgs: id, err := p.ensureJSHelper(obj)
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:150 (0x1029ddb2b)
        (*Page).evaluate: args, err := p.formatArgs(opts)
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page_eval.go:129 (0x1029dd6a7)
        (*Page).Evaluate: res, err = p.evaluate(opts)
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/query.go:172 (0x1029dfa4b)
        (*Page).ElementByJS.func2: res, err = p.Evaluate(opts.ByObject())
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/lib/utils/sleeper.go:140 (0x10287b85b)
        Retry: stop, err := fn()
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/query.go:167 (0x1029df8df)
        (*Page).ElementByJS: err = utils.Retry(p.ctx, p.sleeper(), func() (bool, error) {
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/query.go:143 (0x1029df607)
        (*Page).Element: return p.ElementByJS(evalHelper(js.Element, selector))
/Users/bazuker/go/pkg/mod/github.com/go-rod/rod@v0.116.1/page.go:106 (0x1029d9ef7)
        (*Page).HTML: el, err := p.Element("html")
/Users/bazuker/go/src/github.com/bazuker/hikebook/hikes_plugin.go:162 (0x102aba113)
        (*HikesPlugin).Run: fmt.Println(frame.HTML())
/Users/bazuker/go/pkg/mod/github.com/bazuker/browserbro@v1.0.2/pkg/manager/manager.go:190 (0x102ab8a8f)
        (*Manager).loadPlugins.func1: results, err := plugin.Run(params)
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/context.go:185 (0x102aaac23)
        (*Context).Next: c.handlers[c.index](c)
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/recovery.go:102 (0x102aaac04)
        CustomRecoveryWithWriter.func1: c.Next()
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/context.go:185 (0x102aa686f)
        (*Context).Next: c.handlers[c.index](c)
/Users/bazuker/go/pkg/mod/github.com/bazuker/browserbro@v1.0.2/pkg/manager/middleware.go:19 (0x102ab855f)
        (*Manager).Run.loggerMiddleware.func4: c.Next()
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/context.go:185 (0x102aa9ae3)
        (*Context).Next: c.handlers[c.index](c)
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/gin.go:633 (0x102aa966c)
        (*Engine).handleHTTPRequest: c.Next()
/Users/bazuker/go/pkg/mod/github.com/gin-gonic/gin@v1.10.0/gin.go:589 (0x102aa93b3)
        (*Engine).ServeHTTP: engine.handleHTTPRequest(c)
/usr/local/go/src/net/http/server.go:3137 (0x10299558b)
        serverHandler.ServeHTTP: handler.ServeHTTP(rw, req)
/usr/local/go/src/net/http/server.go:2039 (0x1029918c7)
        (*conn).serve: serverHandler{c.server}.ServeHTTP(w, w.req)
/usr/local/go/src/runtime/asm_arm64.s:1222 (0x10271f473)
        goexit: MOVD    R0, R0  // NOP
github-actions[bot] commented 4 months ago

Please fix the format of your markdown:

31 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"]
31 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]

generated by check-issue

bazuker commented 4 months ago

Just confirmed that almost any operation on that frame will results in panic. I tried frame.Element, frame.Has etc

ysmood commented 4 months ago

This code works fine to me, no matter it's headless or not:

package main

import (
    "fmt"

    "github.com/go-rod/rod"
    "github.com/go-rod/rod/lib/launcher"
)

func main() {
    u := launcher.New().Headless(false).MustLaunch()
    page := rod.New().ControlURL(u).MustConnect().MustPage("https://dash.cloudflare.com/sign-up")
    f := page.MustElement(`iframe[src*="https://challenges.cloudflare.com"]`).MustFrame()
    fmt.Println(f.MustElement("#success").MustHTML())
}
bazuker commented 4 months ago

@ysmood you can try for yourself on this page Select a park, pick a date and time and press next. You will see cloudflare iframe that makes the code panic.

Also, I am running a managed version of Rod in docker, if that makes a difference.

My configuration:

    l = launcher.MustNewManaged(serviceURL).
        UserDataDir(userDataDir).
        Headless(false).
        Devtools(false).
        Leakless(true).XVFB("--server-num="+strconv.Itoa(serverID), "--server-args=-screen 0 1600x900x16")
    l.NoSandbox(true)
    l.Set("disable-web-security")
    l.Set("disable-blink-features", "AutomationControlled")
    l.Delete("enable-automation")
    l.Delete("disable-site-isolation-trials")

    br.browser.Client(l.MustClient())
    err = br.browser.Connect()
    if err != nil {
        return fmt.Errorf("failed to connect to browser: %w", err)
    }
    br.browser.MustIncognito()
zhaofenghao commented 4 months ago

how to bypass cloudflare challenge?

ysmood commented 4 months ago

@bazuker I even can't open the page, when I use my personal browser to navigate to it I got a blank page and error in console:

CleanShot 2024-06-29 at 11 29 19@2x

bazuker commented 4 months ago

@ysmood not sure where you are geographically located but this website definitely works in North America. This is the official website of British Columbia recreational parks and trails booking

ysmood commented 4 months ago

It works fine to me:

https://github.com/go-rod/rod/assets/1415488/a6a21591-a54c-4fc2-9fa4-78f963bb6417