cyrus-and / chrome-har-capturer

Capture HAR files from a Chrome instance
MIT License
530 stars 90 forks source link

Unexpected handling of HTTP response statuscode 204 (No content) #77

Closed p4fg closed 4 years ago

p4fg commented 4 years ago

Hi, thanks for a really well written library! I got an issue with pages returning HTTP 204 (No content).

Pages returning 204 are somehow counted as failing and the onFail-handler is called.

To reproduce:

Observed behaviour:

chrome-har-capturer --retry 4  https://poc.shellcode.se/204.php 
- https://poc.shellcode.se/204.php ✗
  net::ERR_ABORTED
- https://poc.shellcode.se/204.php ✗
  net::ERR_ABORTED
- https://poc.shellcode.se/204.php ✗
  net::ERR_ABORTED
- https://poc.shellcode.se/204.php ✗
  net::ERR_ABORTED
- https://poc.shellcode.se/204.php ✗
  net::ERR_ABORTED
{
    "log": {
        "version": "1.2",
        "creator": {
            "name": "Chrome HAR Capturer",
            "version": "0.13.6",
            "comment": "https://github.com/cyrus-and/chrome-har-capturer"
        },
        "pages": [],
        "entries": []
    }
}

Expected behaviour:

{
  "log": {
    "version": "1.2",
    "creator": {
      "name": "Chrome HAR Capturer",
      "version": "0.13.6",
      "comment": "https://github.com/cyrus-and/chrome-har-capturer"
    },
    "pages": [
      {
        "id": "page_1_08768104101459318",
        "title": "http://poc.shellcode.se/204.php",
        "startedDateTime": "2020-04-03T13:05:17.821Z",
        "pageTimings": {
          "onContentLoad": 56.401999950408936,
          "onLoad": 51.96299982070923
        }
      }
    ],
    "entries": [
    {
        "pageref": "page_1_08768104101459318",
        "startedDateTime": "2020-04-03T13:05:17.821Z",
        "time": 41.89500003121793,
        "request": {
          "method": "GET",
          "url": "http://poc.shellcode.se/204.php",
          "httpVersion": "http/1.1",
          "cookies": [],
          "headers": [
            {
              "name": "Host",
              "value": "poc.shellcode.se"
            },
            {
              "name": "Proxy-Connection",
              "value": "keep-alive"
            },
            {
              "name": "Upgrade-Insecure-Requests",
              "value": "1"
            },
            {
              "name": "User-Agent",
              "value": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36"
            },
            {
              "name": "Accept",
              "value": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
            },
            {
              "name": "Accept-Encoding",
              "value": "gzip, deflate"
            }
          ],
          "queryString": [],
          "headersSize": 483,
          "bodySize": -1
        },
        "response": {
          "status": 204,
          "statusText": "No Content",
          "httpVersion": "http/1.1",
          "cookies": [],
          "headers": [
            {
              "name": "X-Powered-By",
              "value": "PHP/5.6.40"
            },
            {
              "name": "Content-Type",
              "value": "text/html; charset=UTF-8"
            },
            {
              "name": "Date",
              "value": "Fri, 03 Apr 2020 13:05:17 GMT"
            },
            {
              "name": "Server",
              "value": "LiteSpeed"
            },
          ],
          "redirectURL": "",
          "headersSize": 392,
          "bodySize": 0,
          "_transferSize": 392,
          "content": {
            "size": 0,
            "mimeType": "text/html",
            "compression": 0,
            "text": ""
          }
        },
        "cache": {},
        "_fromDiskCache": false,
        "timings": {
          "blocked": 0.4959999167919159,
          "dns": -1,
          "connect": 0.266,
          "send": 0.02699999999999997,
          "wait": 40,
          "receive": 1.1060000397264957,
          "ssl": -1
        },
        "serverIPAddress": "46.16.234.61",
        "connection": "135",
        "_initiator": {
          "type": "other"
        },
        "_priority": "VeryHigh"
      }
    ]
  }
}
cyrus-and commented 4 years ago

Hi, I'm glad that you can find it useful!

Yes the problem is that Chrome considers (top-level at least) 204 resources as failed even though are fully processed. Try this:

const CDP = require('chrome-remote-interface');

async function example() {
    try {
        const client = await CDP();
        const {Network, Page} = client;

        client.on('event', ({method, params}) => {
            console.log(method, params);
        });

        await Network.enable();
        await Page.navigate({url: 'https://poc.shellcode.se/204.php'});
    } catch (err) {
        console.error(err);
    }
}

example();

Then:

$ node 204.js 
[...]
Network.loadingFailed { requestId: '40AA6A899C7C6E74255E4590CAD1DC72',
  timestamp: 3816.6678,
  type: 'Document',
  errorText: 'net::ERR_ABORTED',
  canceled: true }

I'm not sure if I should add an exception to that. What's your use case in detail?

p4fg commented 4 years ago

My current use-case is spidering and analysis. It would be beneficial to know when a retry is warranted (might give a result) or if the retry will give the same result.

p4fg commented 4 years ago

Is there any remote version of the network-request/response tab in the dev-tools? In that tab it is possible to distinguish errors from 204-responses (see attached image) network_204

cyrus-and commented 4 years ago

I don't really see the problem though, you only have that behavior for top-level page loads, but if a webpage contains resources that yield 204 everything works fine. For example:

<img src="https://poc.shellcode.se/" />
<img src="https://poc.shellcode.se/204.php" />
$ gron out.har | grep 'request.url \|response.status '
json.log.entries[0].request.url = "http://127.0.0.1:8080/index-204.html";
json.log.entries[0].response.status = 200;
json.log.entries[1].request.url = "https://poc.shellcode.se/";
json.log.entries[1].response.status = 200;
json.log.entries[2].request.url = "https://poc.shellcode.se/204.php";
json.log.entries[2].response.status = 204;
p4fg commented 4 years ago

If you are writing a spider you will want to visit all links on a page, hence my problem.

cyrus-and commented 4 years ago

I don't know, I'm uncomfortable in adding the exception: if failed BUT 204 then not failed. The way Chrome handles such pages (top-level page loads only) is to emit a Network.loadingFailed with errorText set to net::ERR_ABORTED, so I think I should honour that.

Back to your problem, you can treat net::ERR_ABORTED as an absolute failure, e.g., not a timeout or something that you want to retry.