alixaxel / chrome-aws-lambda

Chromium Binary for AWS Lambda and Google Cloud Functions
MIT License
3.21k stars 293 forks source link

[BUG] Navigation failed because browser has disconnected error for certain websites #104

Closed neekey closed 4 years ago

neekey commented 4 years ago

Hi Just want to share an issue I found that bothered me for a while, which might have something to do with the compiled chromium version 80.

Environment

Expected Behavior

Puppeteer should be able to open website when using goTo()

Current Behavior

For certain websites, failed to open pages, it threw errors of Navigation failed because browser has disconnected, the affected websites that I found:

But it works normal when testing in my local which is using chromium from module puppeteer.

neekey commented 4 years ago

My solution for anyone experiencing the same problem:

fix the chromium version to 79 by fix the module versions to :

oakgary commented 4 years ago

My solution for anyone experiencing the same problem:

fix the chromium version to 79 by fix the module versions to :

  • chrome-aws-lambda: 2.0.x
  • puppeteer-core: 2.0.x
  • puppeteer: 2.0.x

thank you for saving me the headache

i was using chrome-aws-lambda with latest versions as well and ran into the same problem using the 2.0.x versions fixed the problem of navigation crashes

€: I think this might have something to do with data cached within the lambda environment. Sometimes the chrome browser or page will crash immediately on retries but will work when I wait for 15 minutes between retries.

nathan1uphealth commented 4 years ago

This also worked for me! Thanks for this

bschelling commented 4 years ago

Just in case someone else gets confused:

"dependencies": {
    "chrome-aws-lambda": "2.0.x",
    "puppeteer-core": "2.0.x"
  }

did the trick for me too. thanks!

pgreda commented 4 years ago

Same issue here. It can be reporduced by accessing https://filmweb.pl Rollback resolved the problem.

tnolet commented 4 years ago

We are updating to Puppeteer 2.1 on Lambda and found the same issue. We can also reproduce it perfectly with the following site:

https://toogoodtogo.nl

Also can report that the https://www.arabnews.com/ site mentioned by @neekey has exactly the same behaviour

@alixaxel any hunch on what might be going on here? The only relation I see here between the two sites mentioned is they both have a Cloudflare SSL cert.

Josh-ES commented 4 years ago

Same issue on my end. We store HTML in a directory and start an express server within the Lambda pointing to it, then navigating the browser to that server. I can rollback to 2.0.x to resolve the problem.

14gasher commented 4 years ago

I just tried rolling back to 2.0.x for both puppeteer-core and chrome-aws-lambda, but I'm still getting the disconnect. Did any other flags need to be turned on / off?

Josh-ES commented 4 years ago

@14gasher I'm running with the following args and it works for me. Not really sure whether it will solve your problem but just for reference:

import chromium from 'chrome-aws-lambda'
...
await chromium.puppeteer.launch({
    args: [
        ...chromium.args,
        '--single-process',
    ],
})
alixaxel commented 4 years ago

@neekey Sorry it took a bit to follow up on this.

In all of these cases, the browser always crashes with:

[0405/032327.939344:FATAL:service.cc(56)] Check failed: !base::SystemMonitor::Get().

Related issues:

What seems to be happening is Chromium is relying on additional processes for decoding audio/video, and since Lambda is single-threaded it ends up crashing. I'll see what can be done to overcome this.

In all these cases, it seems to be related to audio and/or video being played on the page:


https://www.dailymotion.com/

Caused by dmp.main.e0d00292a4faa658874d.es5.js which plays the videos on the site.

image


https://www.arabnews.com/

The issue seems caused by sca.17.4.114.js, which uses the WebAudio API to fingerprint the device.

image


https://www.sayidaty.net/

Also caused by sca.17.4.114.js.


https://uae.sharafdg.com/

Caused by notif.mp3.


https://www.filmweb.pl/

@pgreda Caused by 765260af-5da5-4c5f-94ad-c2f0f5db7182.mp4.


https://toogoodtogo.nl/

Caused by life-of-bread.mp4 / life-of-bread.webm.

bschelling commented 4 years ago

What about discarding media requests as a workaround? Something along the lines of


await page.setRequestInterception(true);
page.on('request', (req) => {
   if (req.resourceType() === 'Media') {
      req.abort();
   }
   else {
      req.continue();
   }
} 
jongear commented 4 years ago

Adding an abort on image and media requests allowed my lambda to complete successfully.

await page.setRequestInterception(true);

page.on('request', async(request) => {
  const requestType = request.resourceType();

  if (requestType === 'image' || requestType === 'media') {
    return request.abort();
  }

  return request.continue();
});
gsouf commented 4 years ago

Please, note that I opened a ticket on chromium support regarding !base::SystemMonitor::Get().

https://bugs.chromium.org/p/chromium/issues/detail?id=1060099#c6

alexfernandez commented 4 years ago

Recently was bit by this issue, workaround is to remove --single-process from the args passed:

    const args = chromium.args
    args.splice(args.indexOf('--single-process'), 1)
    return await chromium.puppeteer.launch({
        args,
        ...
    })

Unfortunately this doesn't work on AWS Lambda, the browser does never start. @alixaxel Do you know of any way to make it work without --single-process? Thanks!

silveur commented 4 years ago

Seems like AudioService init bug at --single-process mod was solved on Chromium 25 days ago

alexfernandez commented 4 years ago

Seems like AudioService init bug at --single-process mod was solved on Chromium 25 days ago

@silveur I suppose we will have to wait for 3.0 to include the fix?

gsouf commented 4 years ago

@alixaxel do you think you could package latest version of chromium with chrome-aws-lambda so that we can test if it solves.

If you don't have time, can you explain how to do it please?

zachlevy commented 4 years ago

Adding an abort on image and media requests allowed my lambda to complete successfully.

await page.setRequestInterception(true);

page.on('request', async(request) => {
  const requestType = request.resourceType();

  if (requestType === 'image' || requestType === 'media') {
    return request.abort();
  }

  return request.continue();
});

my workaround was to also ignore all javascript files. Luckily I didn't need any of them

requestType === 'script'
andreikrasnou commented 4 years ago

I have the same issue with https://www.jetblue.com/travel-agents/travel-agent-waiver-codes website. As far as I can see there are no media types which are requested. @neekey am i right that the only way now to solve it in single process mode is package downgrade?

alixaxel commented 4 years ago

@gsouf Spent the entire day around this, adding the following option to args seems to solve it:

--disable-features=AudioServiceOutOfProcess

image

Tested on 81.0.4044.0 (which will be published today hopefully).

daveroberts commented 4 years ago

Glad to see you're okay!

gsouf commented 4 years ago

@alixaxel you're the boss thanks!

Also I don't know if you noticed, an internal fix should come in chrome 83

abin-andrews commented 4 years ago

This issue is still coming up, anyone please help, its not working for some sites.

"status": "failed", "error": "Navigation failed because browser has disconnected!", "context": { "callbackWaitsForEmptyEventLoop": true, "functionVersion": "$LATEST",

`try {

let chromeArgs = chromium.args;
chromeArgs.push('--disable-features=AudioServiceOutOfProcess');
chromeArgs.push('--disable-features=AudioServiceOutOfProcessKillAtHang');    
chromeArgs.push('--disable-software-rasterizer');
chromeArgs.push('--disable-gpu');

browser = await chromium.puppeteer.launch({
  args: chromeArgs,
  defaultViewport: chromium.defaultViewport,
  executablePath: await chromium.executablePath,
  headless: chromium.headless,
  ignoreHTTPSErrors: true
});

//Create 
const page = await browser.newPage();
await page.setUserAgent(event.user_agent);
await page.setViewport({width: event.view_port.width, height: event.view_port.height, deviceScaleFactor: 1});

await page.goto(event.screen_shot_url);

//const image = await page.screenshot();
//await Tooling.toS3(Tooling.getPushConfig(image, event.bucket, `image/png`, event.destination, Tooling.s3FileAcl))

result = await page.title();

} catch (e) { console.log(e); console.log('Failed') return failed(e, context) } finally { if (browser !== null) { await browser.close(); } }`

ae-lexs commented 3 years ago

Hi!!

I have the same issue.

Environment

I include some code, I hope it would helpful

serverless.yml

service: generate-file

plugins:
  - serverless-webpack
  - serverless-plugin-warmup
  - serverless-plugin-git-variables

provider:
  name: aws
  runtime: nodejs12.x
  memorySize: 1536
  timeout: 30
  tracing: true
  stage: ${self:custom.stacks.${self:custom.stack}.stage, "dev"}
  region: ${self:custom.stacks.${self:custom.stack}.region, "us-east-1"}
  profile: ${self:custom.stacks.${self:custom.stack}.profile, "default"}
  layers: ${file(config/config.yml):${self:provider.stage}.layers}
  vpc: ${self:custom.stacks.${self:custom.stack}.vpc}
  environment: ${file(config/environment.yml):${self:provider.stage}.environment}
  iamRoleStatements: ${file(config/iam.yml):${self:provider.stage}.iamRoleStatements}
  tags:
    branch: ${git:branch}
    commit: ${git:sha1}
    version: ${git:describeLight}
  apiKeys:
    - ApiKey-${self:service}-${self:provider.stage}

functions:
  generate_pdf:
    description: Generate PDF files from a HTML template.
    handler: src/handlers/generatePDF/handler.handler

resources: ${file(config/resources.yml):${self:provider.stage}.resources}

custom:
  webpack:
    webpackConfig: ./webpack.config.js
    includeModules: true
  stack: ${opt:stack, "dev"}
  stacks: ${file(config/stacks.yml):stacks}
  prefix: ${self:custom.stacks.${self:custom.stack}.prefix}
  scheduleEnabled:
    production: true
    dev: false
  warmup:
    events:
      schedule: 'rate(5 minutes)'
    timeout: 20
    prewarm: true
    enabled: production
    memorySize: 512
    role: ${self:custom.stacks.${self:custom.stack}.warmupRole}
  exportGitVariables: false

generatePDF.ts

function getHTML(templateContent: string, templateName: string) {
  const templateFile = FileSystem.readFileSync(
    `./src/templates/${templateName}/index.hbs`,
    'utf8',
  );
  const template = Handlerbars.compile(templateFile);

  return template(templateContent);
}

export default async function (
  bucketName: string,
  fileName: string,
  templateContent: string,
  templateName: string,
  s3: S3,
) {
  let browser = null;

  try {
    const executablePath = await Chromium.executablePath;

    browser = await Chromium.puppeteer.launch({
      args: ['--disable-features=AudioServiceOutOfProcess', ...Chromium.args],
      defaultViewport: Chromium.defaultViewport,
      executablePath,
      headless: Chromium.headless,
    });

    const browserPage = await browser.newPage();

    browserPage.setContent(getHTML(templateContent, templateName));

    const pdf = await browserPage.pdf({
      format: 'A4',
      printBackground: true,
    });

    s3Upload(bucketName, fileName, pdf, s3);
  } catch (error) {
    console.error(error);
  } finally {
    if (browser) {
      await browser.close();
    }
  }
}

Expected Behavior

Puppeteer should be able to generate the PDF file

Current Behavior

The process raises this error

{
    "errorType": "Runtime.UnhandledPromiseRejection",
    "errorMessage": "Error: Navigation failed because browser has disconnected!",
    "reason": {
        "errorType": "Error",
        "errorMessage": "Navigation failed because browser has disconnected!",
        "stack": [
            "Error: Navigation failed because browser has disconnected!",
            "    at /var/task/node_modules/puppeteer-core/lib/cjs/puppeteer/common/LifecycleWatcher.js:51:147",
            "    at /var/task/node_modules/puppeteer-core/lib/cjs/vendor/mitt/src/index.js:51:62",
            "    at Array.map (<anonymous>)",
            "    at Object.emit (/var/task/node_modules/puppeteer-core/lib/cjs/vendor/mitt/src/index.js:51:43)",
            "    at CDPSession.emit (/var/task/node_modules/puppeteer-core/lib/cjs/puppeteer/common/EventEmitter.js:72:22)",
            "    at CDPSession._onClosed (/var/task/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:247:14)",
            "    at Connection._onClose (/var/task/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:128:21)",
            "    at WebSocket.<anonymous> (/var/task/node_modules/puppeteer-core/lib/cjs/puppeteer/node/NodeWebSocketTransport.js:17:30)",
            "    at WebSocket.onClose (/var/task/node_modules/ws/lib/event-target.js:136:16)",
            "    at WebSocket.emit (events.js:314:20)"
        ]
    },
    "promise": {},
    "stack": [
        "Runtime.UnhandledPromiseRejection: Error: Navigation failed because browser has disconnected!",
        "    at process.<anonymous> (/var/runtime/index.js:35:15)",
        "    at process.emit (events.js:314:20)",
        "    at process.EventEmitter.emit (domain.js:483:12)",
        "    at processPromiseRejections (internal/process/promises.js:209:33)",
        "    at processTicksAndRejections (internal/process/task_queues.js:98:32)"
    ]
}
gsouf commented 3 years ago

@AlexisNava can you please try running the same script in your local terminal with puppeteer option dumpio: true.

This issue is most likely related to chrome itself, if it is dumpio will give you the exact trace returned by chrome.

You can also try to run the same script with disabled single-process flag

ae-lexs commented 3 years ago

@gsouf

I added the option dumpio: true.

export default async function (
  bucketName: string,
  fileName: string,
  templateContent: string,
  templateName: string,
  s3: S3,
) {
  let browser = null;

  try {
    browser = await Chromium.puppeteer.launch({
      args: ['--disable-features=AudioServiceOutOfProcess', ...Chromium.args],
      defaultViewport: Chromium.defaultViewport,
      executablePath: await Chromium.executablePath,
      headless: Chromium.headless,
      dumpio: true,
    });

    const browserPage = await browser.newPage();

    browserPage.setContent(getHTML(templateContent, templateName));

    const pdf = await browserPage.pdf({
      format: 'A4',
      printBackground: true,
    });

    s3Upload(bucketName, fileName, pdf, s3);
  } catch (error) {
    console.error(error);
  } finally {
    if (browser) {
      await browser.close();
    }
  }
}

This is the output:

[23867:23872:1202/114102.049578:ERROR:service_utils.cc(157)] --ignore-gpu-blacklist is deprecated and will be removed in 2020Q4, use --ignore-gpu-blocklist instead.

DevTools listening on ws://127.0.0.1:44521/devtools/browser/f23ccf34-e8ff-424f-9188-4929d9eddf7c
Error: Protocol error (Page.printToPDF): PrintToPDF is not implemented
    at /home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:208:63
    at new Promise (<anonymous>)
    at CDPSession.send (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:207:16)
    at Page.pdf (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1175:43)
    at Object.default_1 [as default] (/home/alexis/Development/Fondeadora/generate_file/src/handlers/generatePDF/app.ts:53:35)
    at processTicksAndRejections (internal/process/task_queues.js:97:5) {
  message: 'Protocol error (Page.printToPDF): PrintToPDF is not implemented'
}
(node:23856) UnhandledPromiseRejectionWarning: Error: Navigation failed because browser has disconnected!
    at /home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:51:147
    at /home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/vendor/mitt/src/index.js:51:62
    at Array.map (<anonymous>)
    at Object.emit (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/vendor/mitt/src/index.js:51:43)
    at CDPSession.emit (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/EventEmitter.js:72:22)
    at CDPSession._onClosed (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:247:14)
    at Connection._onClose (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:128:21)
    at WebSocket.<anonymous> (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/node/NodeWebSocketTransport.js:17:30)
    at WebSocket.onClose (/home/alexis/Development/Fondeadora/generate_file/node_modules/ws/lib/event-target.js:136:16)
    at WebSocket.emit (events.js:311:20)
(node:23856) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:23856) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I search the error: Error: Protocol error (Page.printToPDF): PrintToPDF is not implemented and I found this other closed issue.

An answer from the issue suggests setting headless: true

I changed my code by applying headless: true and moving await browser.close(); inside of the try

export default async function (
  bucketName: string,
  fileName: string,
  templateContent: string,
  templateName: string,
  s3: S3,
) {
  let browser = null;

  try {
    browser = await Chromium.puppeteer.launch({
      args: ['--disable-features=AudioServiceOutOfProcess', ...Chromium.args],
      defaultViewport: Chromium.defaultViewport,
      executablePath: await Chromium.executablePath,
      headless: true,
      dumpio: true,
    });

    const browserPage = await browser.newPage();

    browserPage.setContent(getHTML(templateContent, templateName));

    const pdf = await browserPage.pdf({
      format: 'A4',
      printBackground: true,
    });

    s3Upload(bucketName, fileName, pdf, s3);

    await browser.close();
  } catch (error) {
    console.error(error);
  }
}

This is the output:

DevTools listening on ws://127.0.0.1:42603/devtools/browser/b59839ea-87ec-4a17-bec2-4421dada6491
[1202/115856.086467:ERROR:service_utils.cc(157)] --ignore-gpu-blacklist is deprecated and will be removed in 2020Q4, use --ignore-gpu-blocklist instead.
(node:25518) UnhandledPromiseRejectionWarning: Error: Navigation failed because browser has disconnected!
    at /home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/LifecycleWatcher.js:51:147
    at /home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/vendor/mitt/src/index.js:51:62
    at Array.map (<anonymous>)
    at Object.emit (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/vendor/mitt/src/index.js:51:43)
    at CDPSession.emit (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/EventEmitter.js:72:22)
    at CDPSession._onClosed (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:247:14)
    at Connection._onClose (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/common/Connection.js:128:21)
    at WebSocket.<anonymous> (/home/alexis/Development/Fondeadora/generate_file/node_modules/puppeteer/lib/cjs/puppeteer/node/NodeWebSocketTransport.js:17:30)
    at WebSocket.onClose (/home/alexis/Development/Fondeadora/generate_file/node_modules/ws/lib/event-target.js:136:16)
    at WebSocket.emit (events.js:311:20)
(node:25518) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:25518) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
gsouf commented 3 years ago

browserPage.setContent is supposed to be async, maybe add a await keyword in front of it. s3Upload might also be an async function, hence it might also need the await keyword.

Also this issue does not seem to be directly related to chrome-aws-lambda and neither to this specific ticket, so if the issue persists after you tried to await the function then I would suggest that you try to get support from the puppeteer repository or on stackoverfklow using the tag puppeteer.

ae-lexs commented 3 years ago

Okay, thank you @gsouf

jurgenwerk commented 3 years ago

@gsouf @AlexisNava I had the same problem and adding await to browserPage.setContent fixed it! Thanks!

agnusha commented 3 years ago

For me helped next actions:

Downgrating of pupeteer did't helped for me, so it's my solution.

gsouf commented 3 years ago

@agnusha these errors are mostly due to bugs in chromium. So the best solution to get a long term fix is to start puppeteer with dumpio set to true, get a stack trace of chromium chrashing and report the issue here https://bugs.chromium.org/p/chromium/issues/list with a reproducible case

HSSalman commented 2 years ago

Adding an abort on image and media requests allowed my lambda to complete successfully.

await page.setRequestInterception(true);

page.on('request', async(request) => {
  const requestType = request.resourceType();

  if (requestType === 'image' || requestType === 'media') {
    return request.abort();
  }

  return request.continue();
});

My script was running fine locally, but got the error when running on an AWS Lambda. Intercepted the requests like you showed here and it worked like a charm 🙌