N0taN3rd / Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
https://n0tan3rd.github.io/Squidwarc/
Apache License 2.0
166 stars 26 forks source link

Exception with defaults using Docker in Puppeteer config #48

Closed machawk1 closed 5 years ago

machawk1 commented 5 years ago

Are you submitting a bug report or a feature request?

Bug report.

What is the current behavior?

Exception called with defaults using Docker and latest master (a2f1d6383cbae06ccd5dc315ba88879e85a12ca5). I pulled the repo, changed the directory in the compose file to the working directory root (/tmp/Squidwarc), ran docker-compose build then docker-compose up.

I received the exception:

squidwarc    | Crawler Will Be Generating WARC Files Using the filenamified url
squidwarc    | A Fatal Error Occurred
squidwarc    |   TypeError: Cannot read property 'Disconnected' of undefined
squidwarc    |
squidwarc    |   - puppeteer.js:116 PuppeteerCrawler.init
squidwarc    |     /Squidwarc/lib/crawler/puppeteer.js:116:37
squidwarc    |
squidwarc    |   - next_tick.js:81 processTicksAndRejections
squidwarc    |     internal/process/next_tick.js:81:5
squidwarc    |
squidwarc    |
squidwarc    | Please Inform The Maintainer Of This Project About It. Information In package.json

What is the expected behavior?

For the crawl with the default configuration to complete.

What's your environment?

node v10.12.0 (though may be moot due to Docker), macOS 10.14.2, Squidwarc a2f1d6383cbae06ccd5dc315ba88879e85a12ca5 (latest master), Docker 18.09.0

zanderl commented 5 years ago

I have the same issue - roughly the same configuration.

N0taN3rd commented 5 years ago

Reason for this issue: puppeteer did a refactoring (version bump) and Squidwarc was not aware this is fixed