alixaxel / chrome-aws-lambda

Chromium Binary for AWS Lambda and Google Cloud Functions
MIT License
3.17k stars 289 forks source link

[BUG] Error: spawn ETXTBSY #266

Open mikejackowski opened 2 years ago

mikejackowski commented 2 years ago

Environment

Expected Behavior

My simple lambda should open URL, check what links are present on the site and exit.

Current Behavior

The process crash on error seen below:

2022-03-21T17:38:49.289Z    8f9bcdea-50e7-53ad-81db-9758eab4e42a    ERROR   Invoke Error    {
    "errorType": "Error",
    "errorMessage": "spawn ETXTBSY",
    "code": "ETXTBSY",
    "errno": "ETXTBSY",
    "syscall": "spawn",
    "stack": [
        "Error: spawn ETXTBSY",
        "    at ChildProcess.spawn (internal/child_process.js:408:11)",
        "    at Object.spawn (child_process.js:553:9)",
        "    at BrowserRunner.start (/opt/nodejs/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserRunner.js:80:34)",
        "    at ChromeLauncher.launch (/opt/nodejs/node_modules/puppeteer-core/lib/cjs/puppeteer/node/Launcher.js:88:16)",
        "    at async messageToBacklink (/var/task/processMessage.js:18619:19)",
        "    at async Promise.all (index 0)",
        "    at async Runtime.main [as handler] (/var/task/processMessage.js:18721:21)"
    ]
}

Steps to Reproduce

The code of my function is:

export const messageToBacklink = async (message: any) => {
  console.log('ProcessMessageText: ', JSON.stringify(message));
  const { hostname, pathname, UrlBacklinks } = message;
  const protocol = message.protocol.replace(':', '');
  const urlAddress = protocol + '://' + hostname + pathname;

  const browser = await chromium.puppeteer.launch({
    args: chromium.args,
    executablePath: await chromium.executablePath,
    headless: true,
    // ignoreHTTPSErrors: true,
  });
  try {
    const page = await browser.newPage();
    await page.goto(urlAddress, {
      waitUntil: 'networkidle0',
    });

    const hrefs = await page.$$eval('a', (a) =>
      a.map((a) => ({
        href: a.getAttribute('href'),
        url: null,
        noFollow: a.getAttribute('rel')?.includes('nofollow') || false,
        noReferrer: a.getAttribute('rel')?.includes('noreferrer') || false,
        noOpener: a.getAttribute('rel')?.includes('noopener') || false,
        sponsored: a.getAttribute('rel')?.includes('sponsored') || false,
        external: a.getAttribute('rel')?.includes('external') || false,
        ugc: a.getAttribute('rel')?.includes('ugc') || false,
        rel: a.getAttribute('rel'),
        anchor: a.textContent,
      }))
    );

    const externalLinks: Array<ScannedLinkItemType> = [];

    /**
     * Loop through all of the href objects and get only valid outgoing links for external domains
     */
    Object.values(hrefs).forEach((item) => {
      // if no href is present or it's not valid URL then skip
      if (!item.href || (item.href && !validator.isURL(item.href))) return;
      const itemURL = new URL(item.href);
      // if the hostname is the same as the current hostname then skip
      if (itemURL.hostname === hostname) return;
      // if it's internal link starting with # then skip
      if (itemURL.hash || itemURL.href.startsWith('#')) return;

      externalLinks.push({
        ...item,
        href: item.href,
        url: itemURL,
      });
    });

    await relayClient.request(InserUrlOutgoingLinksStats, {
      url_id: message.id,
      number_of_links: numberOfLinks,
      number_of_dofollow_links: numberOfDofollowLinks,
      number_of_nofollow_links: numberOfNofollowLinks,
    });
    console.log('res: ', res);
    const presentBacklinks = UrlBacklinks.map((backlink: any) => {
      const {
        Target: {
          Url: { hostname, pathname },
        },
        BacklinkAnchor: { text },
      } = backlink;

      const id = JSON.parse(
        Buffer.from(backlink.id, 'base64').toString()
      ).slice(-1)[0];

      const url = hostname.replace('www.', '') + pathname;
      const link = externalLinks.find((link: any) => link.href.includes(url));

      return {
        backlink_id: id,
        status: link ? 'ACTIVE' : 'NOT_FOUND',
        created_at: new Date().toISOString(),
        anchor_exact_match:
          link?.anchor?.trim().toLowerCase() === text.trim().toLowerCase() ||
          null,
        backlink_anchor_text: link?.anchor || null,
        nofollow: link?.noFollow || false,
        noreferrer: link?.noReferrer || false,
        noopener: link?.noOpener || false,
        sponsored: link?.sponsored || false,
        external: link?.external || false,
        ugc: link?.ugc || false,
      };
    });

    console.log('presentBacklinks: ', { presentBacklinks });
    // ....push to backend
    return;
  } catch (e) {
    console.log('error message: ', e);

    const unreachableBacklinks = UrlBacklinks.map((backlink: any) => {
      const id = JSON.parse(
        Buffer.from(backlink.id, 'base64').toString()
      ).slice(-1)[0];
      return {
        backlink_id: id,
        status: 'UNREACHABLE',
        created_at: new Date().toISOString(),
        anchor_exact_match: false,
        nofollow: false,
        backlink_anchor_text: null,
      };
    });
    console.log('unreachableBacklinks: ', { unreachableBacklinks });
    // ... push to backend
    return;
  } finally {
    await browser.close(); // added this just in case?
    return;
  }
};
mikejackowski commented 2 years ago

I came across two pieces that I'm unsure about:

  1. https://giters.com/alixaxel/chrome-aws-lambda/issues/161 - based on this I understand that the browser instance shouldn't be in my try/catch clause, but rather invoked on top and passed as argument next to message?
  2. I know this is a different package but this stands out: "Suggest not to close browser in Lambda ENV, if close it , the Browser object is considered disposed and cannot be used anymore." - not sure what to make of it.
Tapu106 commented 10 months ago

DId you solve it? This error is getting on my nerves

ikushlianski commented 6 months ago

I was getting the same error and then learned from this comment that this error means multiple browser processes might be launched concurrently.

In my case I was awaiting multiple concurrent PDF generation with puppeteer. When I switched to consecutive PDF generation, the error was gone.