cyrus-and / chrome-remote-interface

Chrome Debugging Protocol interface for Node.js
MIT License
4.27k stars 306 forks source link

Error: No data found for resource with given identifier #260

Closed bookin closed 7 years ago

bookin commented 7 years ago

Perhaps you can help me with this, I try to get body from some ajax request on the page, but I all time getting Error: No data found for resource with given identifier and looks like problem only with this request, maybe I'm missing something

const CDP = require('chrome-remote-interface');

setTimeout(() => {
    CDP(async (client) => {
        const {Network, Page, Runtime} = client;
        Network.requestWillBeSent(({requestId, request}) => {
            if(request.url.indexOf("ct2/results/rpc") != -1){
                console.log(`REQ [${requestId}] ${request.method} ${request.url} \n`);
            }
        });
        Network.responseReceived(async ({requestId, response}) => {
            if(response.url.indexOf("ct2/results/rpc") != -1){
                const {body, base64Encoded} = await Network.getResponseBody({requestId});
                console.log(`RES [${requestId}] body: ${body} \n`);
            }
        });
        try {
            await Promise.all([Network.enable(), Page.enable()]);
            await Page.navigate({url: 'https://clinicaltrials.gov/ct2/results?cond=Parents&term=&cntry1=&state1=&Search=Search&recrs=a#wrapper'});
            await Page.loadEventFired();
            await Runtime.evaluate({
                expression: `document.querySelector('.paginate_button.next').click()`
            });
        } catch (err) {
            console.error(err);
        }
    }).on('error', (err) => {
        console.error(err);
    });
}, 1000);

Thanks.

cyrus-and commented 7 years ago

This happens because AFAIK you're only allowed to call Network.getResponseBody when the Network.loadingFinished event has fired. Unfortunately this event doesn't contain the associated request object so you have to keep track of the requestId for which you want to fetch the response body.

I implemented this in the following using a Set:

const CDP = require('chrome-remote-interface');

setTimeout(() => {
    CDP(async (client) => {
        const {Network, Page, Runtime} = client;

        const requests = new Set(); // <---------- HERE

        Network.requestWillBeSent(({requestId, request}) => {
            if(request.url.indexOf("ct2/results/rpc") != -1){
                console.log(`REQ [${requestId}] ${request.method} ${request.url} \n`);

                requests.add(requestId); // <---------- HERE

            }
        });
        Network.loadingFinished(async ({requestId}) => {

            if (requests.has(requestId)) { // <---------- HERE

                const {body, base64Encoded} = await Network.getResponseBody({requestId});
                console.log(`RES [${requestId}] body: ${body} \n`);
            }
        });
        try {
            await Promise.all([Network.enable(), Page.enable()]);
            await Page.navigate({url: 'https://clinicaltrials.gov/ct2/results?cond=Parents&term=&cntry1=&state1=&Search=Search&recrs=a#wrapper'});
            await Page.loadEventFired();
            await Runtime.evaluate({
                expression: `document.querySelector('.paginate_button.next').click()`
            });
        } catch (err) {
            console.error(err);
        }
    }).on('error', (err) => {
        console.error(err);
    });
}, 1000);
bookin commented 7 years ago

Thank you very much for your help, only you are helping people)

ilanc commented 6 years ago

Wish I'd found this issue earlier - I actually reported this problem as a bug over on the chromium bug tracker: https://bugs.chromium.org/p/chromium/issues/detail?id=805887

It still seems to fail occasionally but is much more reliable when called from Network.loadingFinished.

My test code is here: https://github.com/ilanc/devtools-bugs/blob/master/getResponseBody.js

pmurley commented 6 years ago

Got a follow up on this. I'm seeing Network.getResponseBody fail with the error above occasionally (Error: No data found for resource with given identifier) even when waiting for the Network.loadingFinished event and using the requestId from that event as the argument to Network.getResponseBody. It's not common, but it seems to happen consistently for some resources.

Again, the methodology I'm using:

  1. Save request IDs from Network.requestWillBeSent event
  2. When Network.loadingFinished event fires, verify we have seen the requestId before in (1), and then call Network.getResponseBody on that requestId. This (rarely) results in the error.

I'm attempting to save ALL resources loaded by a particular site. I see this problem consistently when visiting cnn.com, for example. It seems like it might have something to do with proxy-related URLs? Here's an example of one resource (which is loaded when visiting CNN) having this problem:

{
 'request': {'documentURL': 'https://cdn.krxd.net/partnerjs/xdi/proxy.3d2100fd7107262ecb55ce6847f01fa5.html',
               'frameId': '70E834A1728B9944F7E49654C2892D5E',
               'hasUserGesture': False,
               'initiator': {'lineNumber': 0,
                             'type': 'parser',
                             'url': 'https://www.cnn.com/'},
               'loaderId': 'ED6F00D73ABC7A3C7C2A13AA544542A2',
               'request': {'headers': {'Referer': 'https://cdn.krxd.net/partnerjs/xdi/proxy.3d2100fd7107262ecb55ce6847f01fa5.html',
                                       'User-Agent': 'Mozilla/5.0 (X11; Linux '
                                                     'x86_64) '
                                                     'AppleWebKit/537.36 '
                                                     '(KHTML, like Gecko) '
                                                     'Chrome/67.0.3396.0 '
                                                     'Safari/537.36'},
                           'initialPriority': 'Low',
                           'method': 'GET',
                           'mixedContentType': 'none',
                           'referrerPolicy': 'no-referrer-when-downgrade',
                           'url': 'https://bea4.v.fwmrm.net/ad/u?mode=echo&cr=https%3A%2F%2Fbeacon.krxd.net%2Fusermatch.gif%3Fpartner%3Dfreewheel%26partner_uid%3D%23%7Buser.id%7D'},
               'requestId': '1000024975.318',
               'timestamp': 133066.557731,
               'type': 'Image',
               'wallTime': 1537295717.29641},
 'response': {'frameId': '70E834A1728B9944F7E49654C2892D5E',
              'loaderId': 'ED6F00D73ABC7A3C7C2A13AA544542A2',
              'requestId': '1000024975.318',
              'response': {'connectionId': 1120,
                           'connectionReused': False,
                           'encodedDataLength': 353,
                           'fromDiskCache': False,
                           'fromServiceWorker': False,
                           'headers': {'Cache-Control': 'no-store',
                                       'Content-Length': '0',
                                       'Content-Type': 'text/html',
                                       'Date': 'Tue, 18 Sep 2018 18:35:17 GMT',
                                       'Expires': '0',
                                       'P3P': 'policyref="https://www.freewheel.tv/w3c/p3p.xml",CP="ALL '
                                              'DSP COR NID"',
                                       'Pragma': 'no-cache',
                                       'Server': 'FWS',
                                       'Set-Cookie': '_uid="f106_6602634828796344746";expires=Wed, '
                                                     '18 Sep 2019 18:35:17 '
                                                     'GMT;domain=.fwmrm.net;path=/;'},
                           'headersText': 'HTTP/1.1 200 OK\r\n'
                                          'Set-Cookie: '
                                          '_uid="f106_6602634828796344746";expires=Wed, '
                                          '18 Sep 2019 18:35:17 '
                                          'GMT;domain=.fwmrm.net;path=/;\r\n'
                                          'Content-Type: text/html\r\n'
                                          'Content-Length: 0\r\n'
                                          'Expires: 0\r\n'
                                          'Pragma: no-cache\r\n'
                                          'Cache-Control: no-store\r\n'
                                          'Date: Tue, 18 Sep 2018 18:35:17 '
                                          'GMT\r\n'
                                          'Server: FWS\r\n'
                                          'P3P: '
                                          'policyref="https://www.freewheel.tv/w3c/p3p.xml",CP="ALL '
                                          'DSP COR NID"\r\n'
                                          '\r\n',
                           'mimeType': 'text/html',
                           'protocol': 'http/1.1',
                           'remoteIPAddress': '38.71.2.160',
                           'remotePort': 443,
                           'requestHeaders': {'Accept': 'image/webp,image/apng,image/*,*/*;q=0.8',
                                              'Accept-Encoding': 'gzip, '
                                                                 'deflate, br',
                                              'Accept-Language': 'en-US,en;q=0.9',
                                              'Connection': 'keep-alive',
                                              'Host': 'bea4.v.fwmrm.net',
                                              'Referer': 'https://cdn.krxd.net/partnerjs/xdi/proxy.3d2100fd7107262ecb55ce6847f01fa5.html',
                                              'User-Agent': 'Mozilla/5.0 (X11; '
                                                            'Linux x86_64) '
                                                            'AppleWebKit/537.36 '
                                                            '(KHTML, like '
                                                            'Gecko) '
                                                            'Chrome/67.0.3396.0 '
                                                            'Safari/537.36'},
                           'requestHeadersText': 'GET '
                                                 '/ad/u?mode=echo&cr=https%3A%2F%2Fbeacon.krxd.net%2Fusermatch.gif%3Fpartner%3Dfreewheel%26partner_uid%3D%23%7Buser.id%7D '
                                                 'HTTP/1.1\r\n'
                                                 'Host: bea4.v.fwmrm.net\r\n'
                                                 'Connection: keep-alive\r\n'
                                                 'User-Agent: Mozilla/5.0 '
                                                 '(X11; Linux x86_64) '
                                                 'AppleWebKit/537.36 (KHTML, '
                                                 'like Gecko) '
                                                 'Chrome/67.0.3396.0 '
                                                 'Safari/537.36\r\n'
                                                 'Accept: '
                                                 'image/webp,image/apng,image/*,*/*;q=0.8\r\n'
                                                 'Referer: '
                                                 'https://cdn.krxd.net/partnerjs/xdi/proxy.3d2100fd7107262ecb55ce6847f01fa5.html\r\n'
                                                 'Accept-Encoding: gzip, '
                                                 'deflate, br\r\n'
                                                 'Accept-Language: '
                                                 'en-US,en;q=0.9\r\n',
                           'securityDetails': {'certificateId': 0,
                                               'certificateTransparencyCompliance': 'not-compliant',
                                               'cipher': 'AES_256_GCM',
                                               'issuer': 'DigiCert SHA2 High '
                                                         'Assurance Server CA',
                                               'keyExchange': 'ECDHE_RSA',
                                               'keyExchangeGroup': 'P-256',
                                               'protocol': 'TLS 1.2',
                                               'sanList': ['*.v.fwmrm.net',
                                                           'v.fwmrm.net'],
                                               'signedCertificateTimestampList': [],
                                               'subjectName': '*.v.fwmrm.net',
                                               'validFrom': 1509494400,
                                               'validTo': 1610539200},
                           'securityState': 'secure',
                           'status': 200,
                           'statusText': 'OK',
                           'timing': {'connectEnd': 144.491,
                                      'connectStart': 7.448,
                                      'dnsEnd': 7.448,
                                      'dnsStart': 0.401,
                                      'proxyEnd': -1,
                                      'proxyStart': -1,
                                      'pushEnd': 0,
                                      'pushStart': 0,
                                      'receiveHeadersEnd': 211.143,
                                      'requestTime': 133066.560247,
                                      'sendEnd': 144.745,
                                      'sendStart': 144.716,
                                      'sslEnd': 144.486,
                                      'sslStart': 76.332,
                                      'workerReady': -1,
                                      'workerStart': -1},
                           'url': 'https://bea4.v.fwmrm.net/ad/u?mode=echo&cr=https%3A%2F%2Fbeacon.krxd.net%2Fusermatch.gif%3Fpartner%3Dfreewheel%26partner_uid%3D%23%7Buser.id%7D'},
              'timestamp': 133066.772863,
              'type': 'Image'}}
}
cyrus-and commented 6 years ago

@pmurley that may also happen (if I recall correctly) when you navigate away from the URL then you call Network.getResponseBody on a stale object; possibly not your case.

pmurley commented 6 years ago

Wow, you're quick! Yeah, that makes sense, and it's something I should look into a bit more, but I don't think that's what is going on here - at least not in a straightforward way. I'm certainly only calling Page.navigate once to visit the target site.

I guess it could be something caused by some sort of auto-navigation/redirection within a particular frame(?), but if anyone has any other ideas, I'd be grateful!

cyrus-and commented 6 years ago

@pmurley if you can come up with a minimal program that reproduces this issue I could take a look at it.

pmurley commented 6 years ago

Here's an example. This usually (but not every single time) prints "Why does this happen?" at least once.

const CDP = require('chrome-remote-interface');

setTimeout(() => {
    CDP(async (client) => {
        const {Network, Page, Runtime} = client;
        var req_ids = new Set();
        Network.requestWillBeSent(({requestId, request}) => {
            req_ids.add(requestId);
        });
        Network.responseReceived(async ({requestId, response}) => {
        });
        Network.loadingFinished(async ({requestId, response}) => {
            if (req_ids.has(requestId)) {
                try {
                    var response_body = await Network.getResponseBody({requestId});
                } catch (err) {
                    console.log(err);
                    console.log('Why does this happen?');
                }
            } else {
                // I am also confused as to why we sometimes we get here,
                // but this is not my main concern.
                console.log('requestId not seen before');
            }

        });
        try {
            await Promise.all([Network.enable(), Page.enable()]);
            await Page.navigate({url: 'http://cnn.com'});
            await Page.loadEventFired();
        } catch (err) {
            console.error(err);
        }
    }).on('error', (err) => {
        console.error(err);
    });
}, 10000);
cyrus-and commented 6 years ago

@pmurley thanks, why the 10s delay though?

So I think the problem here is that you're reusing the same tab for multiple page loads, so you end up with unprocessed events coming from the previous instance that reference stale objects.

In fact, I consistently get that error if I run the script against a tab that has been used for a previous page load, and never with a blank new tab.

I am also confused as to why we sometimes we get here, but this is not my main concern.

Because they are served from the cache, you can catch them with Network.requestServedFromCache.


Here's what I mean:

const CDP = require('chrome-remote-interface');

async function test() {
    try {
        // this is basically the new part //////////////////////////
        const target = await CDP.New();
        const client = await CDP({target});
        ////////////////////////////////////////////////////////////

        const {Network, Page, Runtime} = client;
        const req_ids = new Set();

        Network.requestWillBeSent(({requestId}) => {
            console.log(`${requestId} Network.requestWillBeSent`);
            req_ids.add(requestId);
        });

        Network.requestServedFromCache(({requestId}) => {
            console.log(`${requestId} Network.requestServedFromCache`);
        });

        Network.responseReceived(async ({requestId}) => {
            console.log(`${requestId} Network.responseReceived`);
        });

        Network.loadingFinished(async ({requestId}) => {
            console.log(`${requestId} Network.loadingFinished`);
            if (req_ids.has(requestId)) {
                try {
                    var response_body = await Network.getResponseBody({requestId});
                } catch (err) {
                    console.log(`${requestId} Network.getResponseBody: FAILED`);
                }
            } else {
                console.log(`${requestId} UNKNOWN`);
            }
        });

        Network.loadingFailed(async ({requestId}) => {
            console.log(`${requestId} Network.loadingFailed`);
        });

        await Network.enable();
        await Page.enable();
        await Page.navigate({url: 'http://cnn.com'});
        await Page.loadEventFired();
        console.log('Page.loadEventFired');
    } catch (err) {
        console.error(err);
    }
}

test();

You should handle the errors better and possibly close client and the newly created tab to avoid creating a bunch of tabs.

Bonus: it might be wise to load each page in a new browser context (incognito-like), if that's the case take a look here.

xgj1988 commented 4 years ago

This happens because AFAIK you're only allowed to call Network.getResponseBody when the Network.loadingFinished event has fired. Unfortunately this event doesn't contain the associated request object so you have to keep track of the requestId for which you want to fetch the response body.

I implemented this in the following using a Set:

const CDP = require('chrome-remote-interface');

setTimeout(() => {
    CDP(async (client) => {
        const {Network, Page, Runtime} = client;

        const requests = new Set(); // <---------- HERE

        Network.requestWillBeSent(({requestId, request}) => {
            if(request.url.indexOf("ct2/results/rpc") != -1){
                console.log(`REQ [${requestId}] ${request.method} ${request.url} \n`);

                requests.add(requestId); // <---------- HERE

            }
        });
        Network.loadingFinished(async ({requestId}) => {

            if (requests.has(requestId)) { // <---------- HERE

                const {body, base64Encoded} = await Network.getResponseBody({requestId});
                console.log(`RES [${requestId}] body: ${body} \n`);
            }
        });
        try {
            await Promise.all([Network.enable(), Page.enable()]);
            await Page.navigate({url: 'https://clinicaltrials.gov/ct2/results?cond=Parents&term=&cntry1=&state1=&Search=Search&recrs=a#wrapper'});
            await Page.loadEventFired();
            await Runtime.evaluate({
                expression: `document.querySelector('.paginate_button.next').click()`
            });
        } catch (err) {
            console.error(err);
        }
    }).on('error', (err) => {
        console.error(err);
    });
}, 1000);

How to filter url when i use loadingFinished.

cyrus-and commented 4 years ago

@xgj1988 you need to keep track of the actual request URL, e.g., using a Map. In a nutshell:

This is off topic though, file a new issue if needed.

xgj1988 commented 4 years ago

@xgj1988 Actually ,I use webContens.debugger of electron ,I couldn't find how to get response body .I found the issue from this page , So i think you know the electron too. Could you tell me how to get the response body in the electron ?

cyrus-and commented 4 years ago

@xgj1988 in the same way AFAIK, the API should be the same. Anyway I don't know electron at all. :) Feel free to file a new issue with some minimal working example.

xgj1988 commented 4 years ago

@cyrus-and Network.getResponseBody how to get requetId?

cyrus-and commented 4 years ago

@xgj1988 as I told you, it's the one that you get in the Network.requestWillBeSent, it's all in the original snippet really.

xgj1988 commented 4 years ago

@cyrus-and ok I got it . THANKS

maklimcz commented 3 years ago

@pmurley @xgj1988 @cyrus-and did you figure out what causes problem getting response for some requests? i am also facing this in electron. Could you have a look https://stackoverflow.com/questions/66101799/electron-browserwindow-cannot-get-response-when-debugger-is-attached

cyrus-and commented 3 years ago

@maklimcz could it be about caching as mentioned above?

maklimcz commented 3 years ago

@cyrus-and nay, I think it isn't loaded from cache because I don't get a Network.requestServedFromCache I have traced the event stack for this particular request and got:

Network.requestWillBeSentExtraInfo 13548.212
Network.requestWillBeSent 13548.212
Network.responseReceivedExtraInfo 13548.212
Network.responseReceived 13548.212
Network.dataReceived 13548.212 [repeated 135 times]
...
Network.loadingFinished 13548.212

13548.212 is a requestId

cyrus-and commented 3 years ago

@maklimcz see if you can reproduce this without Electron.

ilanc commented 3 years ago

The conclusion that I came to was that any attempt to "sniff"[^1] network packets using the devtools protocol is not reliable. The getResponse* functions may work but when they fail you can't pin it down to anything that you can trace or correct - it's burried somwhere within chromium (more below).

My solution has been to use workaround code - e.g if I need data in the response body and fail to get it then either I find the data on the rendered html page, or I send a fetch request from the app code directly (i.e. rather than asking devtools to ask chrome to fetch it ... and then trying to sniff the response).

I expect that getResponse* are just poorly tested, poorly used parts of devtools and over time bugs are introduced or get resolved which leads to erratic performance i.e. there isn't a large enough community of people relying on them to moan about it. I came to this conclusion based on this stackoverflow post[^2] which talks about various changes to to chromium which have caused requestWillBeSent to omit some headers, as well as a lot of my own failed experimentation[^3]. It's been years since I did any c++ dev though - so I can't say for sure (haven't tried to tack it down in the chromium source).

Hope it helps.

[^1] like Network.getResponseBody, Network.getResponseBodyForInterception on the receiving side, or Network.requestWillBeSent etc on the sending side [^2] albeit about requestWillBeSent - not getResponse* [^3] not much of which is public other than intercept image data and my original chromium bug

maklimcz commented 3 years ago

@cyrus-and @ilanc it seems i managed to do what I wanted. However I didn't use Network.getResponseBody. I used https://chromedevtools.github.io/devtools-protocol/tot/Fetch/ To use that one need to subscribe for Responses matching a pattern. Then you can react on Fetch.requestPaused events. During that you have direct access to request and indirect to response. To get the response call Fetch.getResponseBody with proper requestId. Below I've pasted a snippet.

dbg.sendCommand('Fetch.enable', {
            patterns: [
                { urlPattern: interestingURLpattern, requestStage: "Response" }
            ]})

   var getResponseJson = async (requestId) => {
         const res = await dbg.sendCommand("Fetch.getResponseBody", {requestId: requestId})
         return JSON.parse(res.base64Encoded ? Buffer.from(res.body, 'base64').toString() : res.body)
     }
    dbg.on('message', (e, m, p) => {
        if(m === 'Fetch.requestPaused') {
            var reqJson = JSON.parse(p.request.postData)
            var resJson = await getResponseJson(p.requestId)
            ...

        await dbg.sendCommand("Fetch.continueRequest", {requestId: p.requestId})
        }
    });

Also remember to send Fetch.continueRequest as

The request is paused until the client responds with one of continueRequest, failRequest or fulfillRequest

https://chromedevtools.github.io/devtools-protocol/tot/Fetch/#event-requestPaused

iddoeldor commented 3 years ago

dbg.sendCommand('Fetch.enable', {

@maklimcz can you please elaborate how to run your example.

robd commented 2 years ago

One thing I found was that it was necessary to ignore 'preflight' requests when calling getResponseBody

Network.on("responseReceived",(async (params) => {
  if (params.type == 'Preflight') return
  const {body} = await Network.getResponseBody(params);
  // Do something with the body
}));