alekzonder / docker-puppeteer

docker image with Google Puppeteer installed
https://hub.docker.com/r/alekzonder/puppeteer/
MIT License
485 stars 138 forks source link

Screenshots of long pages get interrupted #10

Open derFunk opened 6 years ago

derFunk commented 6 years ago

Screnshooting longer pages result in incomplete images.

Example:

docker run --shm-size 1G --rm -v ${PWD}:/screenshots alekzonder/puppeteer:latest full_screenshot_series "https://medium.com/@micheledaliessi/how-does-the-blockchain-work-98c8cd01d2ae" 1366x768 1500

Gives this result: image

Is there any way to receive a complete screenshot of longer pages?

derFunk commented 6 years ago

It's a Chromium Bug: https://bugs.chromium.org/p/chromium/issues/detail?id=770769&desc=2 Here's an example how to make screenshotting long pages work: https://github.com/GoogleChrome/puppeteer/blob/230be28b067b521f0577206899db01f0ca7fc0d2/examples/screenshots-longpage.js

Maybe this can get included into your Docker image @alekzonder ?

alekzonder commented 6 years ago

@derFunk yes, i add screenshot patch to docker image

derFunk commented 6 years ago

I created this, it should be working: https://gist.github.com/derFunk/10747ce60e965de5b771e96b7a4ba8f7

I extended the returned JSON to contain the created single image filenames with the 16k max texture size, and I also removed the need to query for a height dimension as a command line argument.

Stitching the single images together has to be done with e.g. ImageMagick: convert -append file1.png file2.png out.png, by iterating over the filenames (e.g. jq -r '.files[].filename'). It would be an optimization step to do stitching in the container already.

I had a little caveat though: Pretty regularly I encountered that the screenshot loop gets stuck, resulting in Error: Protocol error (Page.captureScreenshot): Target closed.. I didn't dig too deep into it, but found out that a page reload inside of the for loop seems to mitigate that specific error. It's a dirty hack (and commented out in my Gist), without me knowing why this helps, I just tried out several things to fix the getting-stuck problem.

I'll continue testing and can eventually provide a PR.