cyrus-and / chrome-har-capturer

Capture HAR files from a Chrome instance
MIT License
535 stars 90 forks source link
automation browser chrome-debugging-protocol google-chrome har headless http-archive

chrome-har-capturer

CI status

Capture HAR files from a Chrome instance.

Under the hood this module uses chrome-remote-interface to instrument Chrome.

Screenshot

Setup

Install this module from NPM:

npm install chrome-har-capturer

Start Chrome like this:

google-chrome --remote-debugging-port=9222 --headless

Command line utility

The command line utility can be used to generate HAR files from a list of URLs. The following options are available:

-h, --help               output usage information
-t, --host <host>        Chrome Debugging Protocol host
-p, --port <port>        Chrome Debugging Protocol port
-x, --width <dip>        frame width in DIP
-y, --height <dip>       frame height in DIP
-o, --output <file>      write to file instead of stdout
-c, --content            also capture the requests body
-k, --cache              allow caching
-a, --agent <agent>      user agent override
-b, --block <URL>        URL pattern (*) to block (can be repeated)
-H, --header <header>    Additional headers (can be repeated)
-i, --insecure           ignore certificate errors
-g, --grace <ms>         time to wait after the load event
-u, --timeout <ms>       time to wait before giving up with a URL
-r, --retry <number>     number of retries on page load failure
-e, --retry-delay <ms>   time to wait before starting a new attempt
-f, --abort-on-failure   stop after the first failure (incompatible with parallel mode)
-d, --post-data <bytes>  maximum POST data size to be returned
-l, --parallel <n>       load <n> URLs in parallel

Library

Alternatively this module provides a simple API that can be used to write custom applications. See the command line utility source code for a working example.

API

run(urls, [options])

Start the loading of a batch of URLs. Returns an event emitter (see below for the list of supported events).

urls is array of URLs.

options is an object with the following optional properties:

Event: 'load'
function (url, index, urls) {}

Emitted when Chrome is about to load url. index is the index of url in urls. urls is the array passed to run().

Event: 'done'
function (url, index, urls) {}

Emitted when Chrome finished loading url. index is the index of url in urls. urls is the array passed to run().

Event: fail'
function (url, err, index, urls) {}

Emitted when Chrome cannot load url. The Error object err contains the failure reason. Failed URLs will not appear in the resulting HAR object. index is the index of url in urls. urls is the array passed to run().

Event: 'har'
function (har) {}

Emitted when all the URLs have been processed. If all the URLs fails then a valid empty HAR object is returned. har is the resulting HAR object.

fromLog(url, log, [options])

Generate a single-page HAR from an array of raw events that comes from the Chrome Debugging Protocol (e.g., from chrome-remote-interface). Returns a Promise that fulfills to the generated HAR.

url is the page URL;

log is the array of events in the form:

{
    method: '...',
    params: {...}
}

Events to be provided are:

Additional events for WebSockets are:

options is an object with the following optional properties:

When content is true synthetic events in the following form are also expected:

{
    method: 'Network.getResponseBody',
    params: {
        requestId: '...',
        body: '...',
        base64Encoded: true/false
    }
}

These events contain the reply of the Network.getResponseBody method, this is needed because Chrome does not return the body content via events, instead it must be requested manually and the reply must be appended to the other events in the log.

Resources