JupiterOne / playwright-aws-lambda

Support for running Microsoft's Playwright on AWS Lambda and Google Cloud Functions
MIT License
400 stars 53 forks source link

Download event not caught and always times out #27

Open imhashir opened 3 years ago

imhashir commented 3 years ago

Thanks a bunch for creating this awesome package.

I was having an issue with the download event. It works great when I try to execute my code locally (serverless invoke local) but when I deploy this via serverless deploy, the waitForEvent('download') times out.

Here's the code:

  const { page, browser } = await openWebpage(URL);

  const [download] = await Promise.all([
    // Start waiting for the download
    page.waitForEvent('download'),
    // Perform the action that initiates download
    page.click(`#${BTN_ID}`),
  ]);

Here's the openWebpage function:

export async function openWebpage(url) {
  const browser = await playwright.launchChromium();
  const context = await browser.newContext({
    acceptDownloads: true,
  });

  const page = await context.newPage();
  await page.goto(url);

  return { page, browser: context };
}

A similar issue was posted in playwright's official repo here. In that same issue, I've commented about my issue as well, here.

I guess since this package was created based on chrome-aws-lambda, which is for puppeteer basically, and puppeteer does not support download event, so it wasn't included in this package as well. But that's just a random guess. I'd love to help in any way to get this issue fixed.

Hope to hear from you soon.

Madhu1512 commented 3 years ago

I am also seeing same timeout error when running in lambda.

TimeoutError: Timeout while waiting for event "download" Note: use DEBUG=pw:api environment variable and rerun to capture Playwright logs.

"playwright-aws-lambda": "^0.6.0", "playwright-core": "^1.8.0",

Madhu1512 commented 3 years ago

I feel the issue is more with the packaged version of chrome with this library. For now, I switched to the new docker container functionality with lambda and able to process the downloads without any issue.

imhashir commented 3 years ago

I feel the issue is more with the packaged version of chrome with this library. For now, I switched to the new docker container functionality with lambda and able to process the downloads without any issue.

Amazing. Are you doing it via Serverless or bare lambda? Can you guide me through the process or share some code snippet? Thank You.

osmenia commented 3 years ago

I feel the issue is more with the packaged version of chrome with this library. For now, I switched to the new docker container functionality with lambda and able to process the downloads without any issue.

Wow, I am also interested. Can you please guide us through the process or share some code snippet? Thank You very much and have nice day.

anupsunni commented 3 years ago

I feel the issue is more with the packaged version of chrome with this library. For now, I switched to the new docker container functionality with lambda and able to process the downloads without any issue.

Awesome, it would be great if you could guide us here.

Thanking you in anticipation.

Madhu1512 commented 3 years ago

Here is the example I put together for the playwright running in a lambda docker container.

https://github.com/Madhu1512/playwright-lambda-demo

imhashir commented 3 years ago

Thanks a lot @Madhu1512 for going through the effort of creating an example for us. I'll have to look into docker based lambda deployments to get that to work but I'll definitely try your solution. For now, I could get it to work by downgrading playwright-core to 1.0.2 as suggested by @osmenia in https://github.com/microsoft/playwright/issues/3726#issuecomment-767374664

osmenia commented 3 years ago

@austinkelleher

can you pls update chromium see https://github.com/microsoft/playwright/issues/3726#issuecomment-767254216

CRSylar commented 2 years ago

Hi All !

any news ? i'm stucked with this error...

aws lambda of course, nodejs > 16 runtime

Here's my package.json :

"dependencies": { "playwright-aws-lambda": "^0.9.0", "playwright-core": "^1.26.0" }

i've tried to downgrade Playwright-core to the suggested 1.2.0 but then i need to refactor all the code since the locator not exist in such old version...

Any suggestion ? note that i've also tried to "manually" dispatch the click event but without success

what i need to achieve is to save the downloaded file to /tmp/ so i cant parse it ( is a Csv) later on.

finally, there's the code ( locally works flawless)

` const playwright = require('playwright-aws-lambda')

const extractData = async () => { const browser = await playwright.launchChromium() const cxt = await browser.newContext()

const page = await cxt.newPage()

await page.goto('https:/<TheTargetSite>/auth/login');
]
await page.locator('input[type="email"]').click();

await page.locator('input[type="email"]').fill('XXXXX@XXXX.XX');

await page.locator('input[type="password"]').click();

await page.locator('input[type="password"]').fill('YYYYY');

await page.locator('button:has-text("Log in")').click();

await page.locator('a:has-text("Rides")').click();

await page.locator('text=ActiveStatus').click();

await page.locator('text=Ended').click();

await page.locator('[placeholder="Start date"]').click();

await page.locator('[aria-label="September 19\\, 2022"]').click();

await page.locator('[aria-label="September 19\\, 2022"]').click();

await page.locator('[aria-label="Export rides"]').click();

const [download] = await Promise.all([
    page.waitForEvent('download'),
    page.locator('button:has-text("Export")').click()
]);
await download.saveAs('/tmp/rides.csv')
await page.close()
await cxt.close()
await browser.close()

}

module.exports = { extractData } `

SamLoy commented 1 year ago

I was able to work around this issue by fixing a /tmp folder for the chrome to output its temporary files, then watch the directory for the PDF to arrive.

Obviously not perfect for every situation, but works well for us when the PDF download is reliable and only will be one download per session.

For example:

   const tmpFolder = "/tmp/pdfs/" + uuid();
   const browser = await playwright.launchChromium({downloadsPath: tmpFolder});
   const context = await browser.newContext();
   const page = await context.newPage();

  ...

   await page.getByText("Download PDF").click()
   let pdfFiles: string[] = [];

   while(!pdfFiles.length) {
        await page.waitForTimeout(1000);
        pdfFiles = fs.readdirSync(tmpFolder);
   }

   const pdfData = fs.readFileSync(`${tmpFolder}/${pdfFiles[0]}`);
TheAPIguys commented 1 year ago

I was able to work around this issue by fixing a /tmp folder for the chrome to output its temporary files, then watch the directory for the PDF to arrive.

Obviously not perfect for every situation, but works well for us when the PDF download is reliable and only will be one download per session.

For example:

   const tmpFolder = "/tmp/pdfs/" + uuid();
   const browser = await playwright.launchChromium({downloadsPath: tmpFolder});
   const context = await browser.newContext();
   const page = await context.newPage();

  ...

   await page.getByText("Download PDF").click()
   let pdfFiles: string[] = [];

   while(!pdfFiles.length) {
        await page.waitForTimeout(1000);
        pdfFiles = fs.readdirSync(tmpFolder);
   }

   const pdfData = fs.readFileSync(`${tmpFolder}/${pdfFiles[0]}`);

Hi Samloy,

I have use tmp folder in the past for other thing in lambda functions to store temperature files. In this case do you have to pre create the folder every run or added to your source code in AWS Lambda? Or it was enough just with this snippet above?

Many thanks for your suggestion and help.

Regards Dan

zhw2590582 commented 10 months ago

@SamLoy Very good idea, I successfully ran playwright-aws-lambda in vercel and then downloaded the file