go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.
https://go-rod.github.io
MIT License
5.38k stars 353 forks source link

Read body from NetworkResponseReceived event #864

Closed FrenchGithubUser closed 1 year ago

FrenchGithubUser commented 1 year ago

Rod Version: v0.112.8

The code to demonstrate your question

    go page.EachEvent(func(e *proto.PageLoadEventFired) {
                //some stuff
        wg.Done()
    }, func(e *proto.NetworkResponseReceived) {
        if e.Response.URL == "https://thotsbay.ac/search/search" {
                        // how can I get the response body here ?
            fmt.Println(e.Response)
        }
    })()

What you got

The NetworkResponseReceived object, but I'm looking for the body of the response (which in this case is some json)

What have you tried to solve the question

Searching in the doc, the internet, ask chatGPT

github-actions[bot] commented 1 year ago

Please fix the format of your markdown:

5 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]

generated by check-issue

ysmood commented 1 year ago

https://github.com/go-rod/rod/blob/49cd11c3b7d361d30feb5e8f346ec0832e48cb3b/hijack_test.go#L87

FrenchGithubUser commented 1 year ago

Thanks for your answer ! However, I felt like it would make sense to have access to the response body from the NetworkResponseReceived object...

ysmood commented 1 year ago

Then I think it's not related to Rod, you can ask the chromium team to support it: https://developer.chrome.com/docs/devtools/

FrenchGithubUser commented 1 year ago

right, thanks for the details !

TangMonk commented 1 year ago

ctx.Res

    go page.EachEvent(func(e *proto.NetworkResponseReceived) {
        fmt.Println("got event:", e.Type)
        if e.Response.URL == "https://www.dextools.io/shared/analytics/tokens/social-network-updates?chain=ether" {
            body := e.Response.Body
        }
    })()

e.Response.Body not works

image
ischeck commented 1 year ago

maybe you need this https://github.com/go-rod/rod/issues/764

image

lukeed commented 1 year ago

Sorry to resurrect, but I'm trying to get the response body from a Service Worker that intercepted the request. I'm using NetworkGetResponseBody and see the e.Response fine, but @ysmood's reply is for the Hijacking Router, which I'm not using.

I had to use @ischeck's approach, which results in this error:

e.requestID 70503.6
ERROR:  {-32000 No resource with given identifier found }

Any ideas?

ysmood commented 1 year ago

@lukeed You can use -rod=cdp flag to print out all the events, and check if the id exists.

go run main.go -rod=cdp
lukeed commented 1 year ago

@ysmood Thanks for the info. The RequestID does, in fact, exist:

fetch := fmt.Sprintf(`function() { return fetch(%q, %s) }`, r.URL, string(bytes))
fmt.Printf("\nFETCH:\n%s\n\n", fetch)

e := proto.NetworkResponseReceived{}
wait := page.WaitEvent(&e)

if _, err := page.Eval(fetch); err != nil {
    fmt.Printf("\nEVAL ERROR:\n%v\n\n", err)
}

fmt.Println("\nWAITING...")
wait()
fmt.Println("\nWAITED!")

fmt.Println("e.res.url", e.Response.URL)
fmt.Println("e.requestID", e.RequestID)

reply, err := (proto.NetworkGetResponseBody{RequestID: e.RequestID}).Call(page)
if err != nil {
    fmt.Println("ERROR: ", err)
    return
}
[cdp] 2023/11/06 10:48:02 <- @0DBD4410 Network.responseReceived {"requestId":"76033.4","loaderId":"A2626D7C9FAC9F4F0953DF267774849D","timestamp":2222573.643943,"type":"Fetch","response":{"url":"http://localhost:53509/hello","status":200,"statusText":"","headers":{"content-type":"text/plain;charset=UTF-8"},"mimeType":"text/plain","connectionReused":false,"connectionId":0,"fromDiskCache":false,"fromServiceWorker":true,"fromPrefetchCache":false,"encodedDataLength":-1,"timing":{"requestTime":2222573.642628,"proxyStart":-1,"proxyEnd":-1,"dnsStart":-1,"dnsEnd":-1,"connectStart":-1,"connectEnd":-1,"sslStart":-1,"sslEnd":-1,"workerStart":0.003,"workerReady":0.186,"workerFetchStart":0.186,"workerRespondWithSettled":1.037,"sendStart":0.004,"sendEnd":0.004,"pushStart":0,"pushEnd":0,"receiveHeadersStart":1.161,"receiveHeadersEnd":1.161},"serviceWorkerResponseSource":"fallback-code","responseTime":1.699296482932593e+12,"protocol":"http/1.1","alternateProtocolUsage":"alternativeJobWonWithoutRace","securityState":"secure"},"hasExtraInfo":false,"frameId":"AE231E4C2DFF511F5F1F0FA6D9161B94"}
[cdp] 2023/11/06 10:48:02 <= #20 {"result":{"type":"object","value":{}}}
[cdp] 2023/11/06 10:48:02 <- @0DBD4410 Network.dataReceived {"requestId":"76033.4","timestamp":2222573.644177,"dataLength":15,"encodedDataLength":0}

WAITING...
[cdp] 2023/11/06 10:48:02 => #21 @0DBD4410 Network.disable null
[cdp] 2023/11/06 10:48:02 <= #21 {}

WAITED!
e.res.url http://localhost:53509/hello
e.requestID 76033.4
[cdp] 2023/11/06 10:48:02 => #22 @0DBD4410 Network.getResponseBody {"requestId":"76033.4"}
[cdp] 2023/11/06 10:48:02 <= #22 error: {"code":-32000,"message":"No resource with given identifier found","data":""}
ERROR:  {-32000 No resource with given identifier found }
ysmood commented 1 year ago

@lukeed I think you need to find out the correct page for the service worker, a service worker usually uses a different background page.

lukeed commented 1 year ago

I was on the correct page. What I needed was to wait for the NetworkLoadingFinished event & then ask for the Network.getResponseBody in that callback. Not sure if thats expected behavior or not, but thats what I saw during the -rod=cdp debugging.

Similarly I only randomly see NetworkDataReceived event, which (to me) means that I can't reliably forward streamed responses from the SW to my server, and instead I need to wait for NetworkLoadingFinished and send the response body in 1 w.Write call

ysmood commented 1 year ago

We'd better raise another issue for service worker, I confirmed that it's not easy to get the request events from it.

duolabmeng6 commented 11 months ago

How to get the body

duolabmeng6 commented 11 months ago

您需要为 Service Worker 找出正确的内容,Service Worker

maybe you need this #764

image

Your code will not run in the latest version without Body

duolabmeng6 commented 11 months ago

I hope it helps others

package main

import (
    "fmt"
    "github.com/go-rod/rod"
    "github.com/go-rod/rod/lib/launcher"
    "github.com/go-rod/rod/lib/proto"
    "time"
)

func main() {
    url := launcher.New().Headless(false).MustLaunch()
    browser := rod.New().ControlURL(url).MustConnect()
    page := browser.MustPage("")

    go func() {
        page.EachEvent(func(e *proto.NetworkRequestWillBeSent) {
            fmt.Printf("Request: %s %s\n", e.Request.Method, e.Request.URL)
        }, func(e *proto.NetworkResponseReceived) {
            reply, err := (proto.NetworkGetResponseBody{RequestID: e.RequestID}).Call(page)
            if err != nil {
                fmt.Println(err)
            }
            fmt.Println(reply.Body)
        })()
    }()
    time.Sleep(3 * time.Second)
    page.MustNavigate("https://www.baidu.com")
    select {}
}
KingWu commented 8 months ago

@duolabmeng6 i still face the issue by using your sample code ERROR: {-32000 No resource with given identifier found }

any idea?

KingWu commented 8 months ago

Googling how ppl to solve the issue

https://www.appsloveworld.com/google-chrome-extension/4/chrome-extension-quotno-resource-with-given-identifier-foundquot-when-trying

Refactor to use the event NetworkLoadingFinished. To be more stable

var targetUrl = "xxxx"
page.EachEvent(func(e *proto.NetworkRequestWillBeSent) {
            fmt.Printf("Request: %s %s\n", e.Request.Method, e.Request.URL)
            if strings.Contains(e.Request.URL, targetUrl) {
                requestId = e.RequestID
            }
        }, func(e *proto.NetworkLoadingFinished) {
            if e.RequestID == requestId {
                fmt.Println("NetworkLoadingFinished")
                reply, err := (proto.NetworkGetResponseBody{RequestID: requestId}).Call(page)
                if err != nil {
                    fmt.Println(err)
                }
                fmt.Println(reply.Body)
            }
        })()
zidantousoft commented 3 weeks ago

Googling how ppl to solve the issue

https://www.appsloveworld.com/google-chrome-extension/4/chrome-extension-quotno-resource-with-given-identifier-foundquot-when-trying

Refactor to use the event NetworkLoadingFinished. To be more stable

var targetUrl = "xxxx"
page.EachEvent(func(e *proto.NetworkRequestWillBeSent) {
          fmt.Printf("Request: %s %s\n", e.Request.Method, e.Request.URL)
          if strings.Contains(e.Request.URL, targetUrl) {
              requestId = e.RequestID
          }
      }, func(e *proto.NetworkLoadingFinished) {
          if e.RequestID == requestId {
              fmt.Println("NetworkLoadingFinished")
              reply, err := (proto.NetworkGetResponseBody{RequestID: requestId}).Call(page)
              if err != nil {
                  fmt.Println(err)
              }
              fmt.Println(reply.Body)
          }
      })()

yeah, it works, thx