LucianoGanga / simple-headless-chrome

Simple abstraction to use Chrome as a Headless Browser with Node JS
MIT License
217 stars 50 forks source link
chrome chrome-headless google-chrome horseman node-browsers testing-tools unit-testing

Build Status

simple-headless-chrome

This project is looking for a maintainer

If you'd like to help others in this project, you're more than welcome! I made this project for work and I wanted to make it available for other people, but usually I don't have the time I'd like to have to maintain the project. So, if you're interested, and want to help with this, just let me know :)

It will be mostly having help to answer some doubts and issues.

Thanks!

Important version >= 3.3.0

Version 3.3.0 includes a new feature that allows managing browser tabs.

This new feature comes with some breaking changes that will allow us future scalability.

To avoid problems for people that uses version >= 3.3.0 of this module, we supported those breaking changes with methods that will be deprecated in version 4.0.0.

Introduction

This is an abstraction to use a Headless version of Google Chrome in a very simple way. I was inspired by the next projects:

And I had to read a lot here too:

And you can also use this in heroku thanks to https://github.com/heroku/heroku-buildpack-google-chrome

I built this basically because I got tired of an error I received in an edge case when using PhantomJS (Unhandled reject Error: Failed to load url). So I decided to make my own abstraction, to be used in a heroku app, and simple to use as Horseman.

I didn't have time to document here in the readme, but every method in the source code is documented.

It's really simple to use. I hope I can get some time to make a QuickStart guide + document the API methods here.

You can read my post in Medium about this module: How to tell to a headless Google Chrome to write a post in Medium for you

You can check a video of the module in action clicking in the image below

A quick example

Features

And comming soon...

Collaboration

If you want to collaborate with the project, in any way (documentation, examples, fixes, etc), just send a PR :)

If you rock at making tests, it would be very useful if you can help us making this module better. It's not necesary to build all the tests, but if someone knows how to code the base to add tests to this module, it would really help for someone else to start with this part.

Thank you to everyone who already help submitting a PR! :D

Installation

1) Install Google Chrome Headless

In your PC

Mac: Chrome Headless is shipped in Chrome Canary. You can install it here: https://www.google.com/chrome/browser/canary.html

Linux: Chrome headless is shipped on chrome 59. so you can install Chrome 59 to use the headless mode:

https://askubuntu.com/questions/79280/how-to-install-chrome-browser-properly-via-command-line

sudo apt-get install libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome*.deb  # Might show "errors", fixed by next line
sudo apt-get install -f

In a NodeJS Heroku App

Just add the buildpack for Heroku and vualá! Everything is ready You can check the buildpack repository here: https://github.com/heroku/heroku-buildpack-google-chrome

Using a Docker image

With the addition of Chrome Remote Interface into Chrome 59, a simple way to install is using the Docker image for Chrome Headless, such as https://hub.docker.com/r/justinribeiro/chrome-headless/ or https://hub.docker.com/r/yukinying/chrome-headless/

If using Docker, in your app, configure for headless as follows:

const browser = new HeadlessChrome({
  headless: true,
  launchChrome: false,
  chrome: {
    host: 'localhost',
    port: 9222, // Chrome Docker default port
    remote: true,
  },
  browserlog: true
})

2) Install the NPM Module

npm install --save simple-headless-chrome

Compatibility

Thanks to @lewisf, simple-headless-chrome is compatible on NodeJS >= 4! I hope more persons can benefit of this now :)

Usage

const HeadlessChrome = require('simple-headless-chrome')

const browser = new HeadlessChrome({
  headless: true, // If you turn this off, you can actually see the browser navigate with your instructions,
  chrome: {
    userDataDir: '/tmp/headlessDataDir' // This can be null, so a tmp folder will be created and then destroyed
  }
})

Once you have the browser instance, you can call the methods to interact with it.

Methods

inject

Injects JavaScript in the page

Modules available: jQuery, jquery, jQuery.slim and jquery.slim

Parameters

Examples

inject('jquery')

You can use jsdelivr to inject any npm or github package in the page
inject('https://cdn.jsdelivr.net/npm/lodash@4/lodash.min.js')
inject('https://cdn.jsdelivr.net/npm/jquery@3/dist/jquery.min.js')

You can inject a local Javascript file
inject('./custom-file.js')
inject(__dirname + '/path/to/file.js')

Note: the path will be resolved with `require.resolve()` so you can include
files that are in `node_modules` simply by installing them with NPM
inject('jquery/dist/jquery.min')
inject('lodash/dist/lodash.min')

injectRemoteScript

Injects a remote script in the page

Parameters

Examples

injectRemoteScript(https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js)

injectScript

Injects code in the DOM as script tag

Parameters

evaluate

Evaluates a fn in the context of the browser

Parameters

evaluateAsync

Evaluates an async fn in the context of the browser

Parameters

evaluateOnNode

Evaluates a fn in the context of a passed node

Parameters

goTo

Navigates to a URL

Parameters

Properties

getNodeValue

Get the value of an Node.

Parameters

Returns object Object containing type and value of the element

getValue

Get the value of an element.

Parameters

Returns object Object containing type and value of the element

setNodeValue

Set the value of an element.

Parameters

setValue

Set the value of an element.

Parameters

fill

Fills a selector of an input or textarea element with the passed value

Parameters

clear

Clear an input field.

Parameters

querySelector

Returns the node associated to the passed selector

Parameters

focus

Focus on an element matching the selector

Parameters

type

Simulate a keypress on a selector

Parameters

typeText

Types text (doesn't matter where it is)

Parameters

select

Select a value in an html select element.

Parameters

keyboardEvent

Fire a key event.

Parameters

wait

Waits certain amount of ms

Parameters

onConsole

Binding callback to handle console messages

Parameters

waitForPageToLoad

Waits for a page to finish loading. Throws error after timeout

Parameters

waitForFrameToLoad

Waits for all the frames in the page to finish loading. Returns the list of frames after that

Parameters

Returns object List of frames, with childFrames

waitForSelectorToLoad

Waits for a selector to finish loading. Throws error after timeout

Parameters

mouseEvent

Fire a mouse event.

Parameters

click

Click on a selector by firing a 'click event' directly in the element of the selector

Parameters

clickOnSelector

Clicks left button hover the centroid of the element matching the passed selector

Parameters

getNodeCentroid

Calculates the centroid of a node by using the boxModel data of the element

Parameters

Returns object { x, y } object with the coordinates

getCookies

Get the browser cookies

Returns object Object with all the cookies

setCookie

Set the browser cookies

Parameters

Properties

Returns boolean True if successfully set cookie

clearBrowserCookies

Clear the browser cookies

exist

Checks if an element matches the selector

Parameters

Returns boolean Boolean indicating if element of selector exists or not

visible

Checks if an element matching a selector is visible

Parameters

Returns boolean Boolean indicating if element of selector is visible or not

getScreenshot

Takes a screenshot of the page and returns it as a string

Parameters

Properties

Returns string Binary or Base64 string with the image data

saveScreenshot

Saves a screenshot of the page

Parameters

Properties

Returns string Binary or Base64 string with the image data

printToPDF

Prints the page to PDF

Parameters

Properties

Returns string Binary or Base64 string with the PDF data

savePdf

Saves a PDF file of the page

Parameters

Properties

getSelectorViewport

Get the Viewport of the element matching a selector

Parameters

Returns Viewport Object with the viewport properties (https://chromedevtools.github.io/devtools-protocol/tot/Page/#type-Viewport)

getFrames

Get the list of frames in the loaded page

Returns object List of frames, with childFrames

resizeFullScreen

Resize viewports of the page to full screen size

handleDialog

Accepts or dismisses a JavaScript initiated dialog (alert, confirm, prompt, or onbeforeunload)

Parameters

post

Post data from the browser context

Parameters

Returns object Request status and data

value

TODO: Take the value from the DOM Node. For some reason, there're some pages where is not possible to get the textarea value, as its nodeId refreshes all the time

setNodeValue

TODO: Take the value from the DOM Node. For some reason, there're some pages where is not possible to get the textarea value, as its nodeId refreshes all the time

browserIsInitialized

Checks if the browser is initialized. Exits the process if it's not

fixSelector

As the selectors may contain colons, it's necessary to escape them in order to correctly match an element

Parameters

Returns string The selector with colons escaped (One backslash to escape the ':' for CSS, and other to escape the first one for JS)

promiseTimeout

Runs a promise and throws an error if it's not resolved before the timeout

Parameters

interleaveArrayToObject

Transforms an interleave array into a key - value object

Parameters

Returns object The key value object

objectToEncodedUri

Given an object, transforms it's properties to a URL encoded string

Parameters

Returns string The URL Enconded object

sleep

Creates some delay

Parameters

Returns promise The promise that will solve after the delay

Example

const HeadlessChrome = require('simple-headless-chrome')

const browser = new HeadlessChrome({
  headless: true // If you turn this off, you can actually see the browser navigate with your instructions
  // see above if using remote interface
})
async function navigateWebsite() {
  try {
    await browser.init()

    const mainTab = await browser.newTab({ privateTab: false })

    // Navigate to a URL
    await mainTab.goTo('http://www.mywebsite.com/login')

    // Fill an element
    await mainTab.fill('#username', 'myUser')

    // Type in an element
    await mainTab.type('#password', 'Yey!ImAPassword!')

    // Click on a button
    await mainTab.click('#Login')

    // Log some info in your console
    await mainTab.log('Click login')

    // Wait some time! (2s)
    await mainTab.wait(2000)

    // Log some info in your console, ONLY if you started the app in DEBUG mode (DEBUG='HeadlessChrome*' npm start)
    await mainTab.debugLog('Waiting 5 seconds to give some time to all the redirects')

    // Navigate a little...
    await mainTab.goTo('http://www.mywebsite.com/myProfile')

    // Check the select current value
    const myCurrentSubscriptionPlan = await mainTab.getValue('#subscriptionSelect')
    console.log(myCurrentSubscriptionPlan) // {type: 'string', value: '1 month' }

    // Edit the subscription
    await mainTab.select('#subscriptionSelect', '3 months')
    await mainTab.click('#Save')

    // Resize the viewport to full screen size (One use is to take full size screen shots)
    await mainTab.resizeFullScreen()

    // Take a screenshot
    await mainTab.saveScreenshot('./shc.png')

    // Get a HTML tag value based on class id
    const htmlTag = await mainTab.evaluate(function(selector) {
        const selectorHtml = document.querySelector(selector)
        return selectorHtml.innerHTML
    }, '.main'); // returns innerHTML of first matching selector for class "main"

    // Close the browser
    await browser.close()
  } catch (err) {
    console.log('ERROR!', err)
  }
 }
 navigateWebsite()

TODO:

Better docs

Add more methods

Support more Chrome flags

const browser = new HeadlessChrome({
    headless: false, // If you turn this off, you can actually see the browser navigate with your instructions
    chrome: {
      flags: [
        '--use-fake-device-for-media-stream',
        '--use-fake-ui-for-media-stream'
      ]
    }
  })

And more...

Tests

I was thinking on using this HTML page to make all the tests: https://github.com/cbracco/html5-test-page

It'd be great to have some unit tests for each HTML element; besides, those test may be useful examples for everyone.

More examples!!!