apify / apify-actor-docker

Base Docker images for Apify actors.
https://hub.docker.com/u/apify
Apache License 2.0
67 stars 21 forks source link

Chromium not found in actor-node-playwright-chrome #87

Open vanekj opened 1 year ago

vanekj commented 1 year ago

Hello!

I ran into an issue while configuring my crawler Dockerfile with apify/actor-node-playwright-chrome image.

When I build and run my Dockerfile, I get this error and it stops.

ERROR PlaywrightCrawler: Request failed and reached maximum retries. browserType.launchPersistentContext:
Executable doesn't exist at /home/myuser/pw-browsers/chromium-1028/chrome-linux/chrome
╔═════════════════════════════════════════════════════════════════════════╗
║ Looks like Playwright Test or Playwright was just installed or updated. ║
║ Please run the following command to download new browsers:              ║
║                                                                         ║
║     npx playwright install                                              ║
║                                                                         ║
║ <3 Playwright Team                                                      ║
╚═════════════════════════════════════════════════════════════════════════╝

Could you please help me, what can I change to make it work?

Thank you! 🙏🏻


Dockerfile

FROM apify/actor-node-playwright-chrome:18

COPY --chown=myuser:myuser package*.json ./

RUN npm --quiet set progress=false
RUN npm ci --only=production

COPY --chown=myuser:myuser . ./

CMD ["node", "src/main.js"]

.dockerignore

**/.classpath
**/.dockerignore
**/.env
**/.git
**/.gitignore
**/.project
**/.settings
**/.toolstarget
**/.vs
**/.vscode
**/*.*proj.user
**/*.dbmdl
**/*.jfm
**/charts
**/docker-compose*
**/compose*
**/Dockerfile*
**/node_modules
**/npm-debug.log
**/obj
**/secrets.dev.yaml
**/values.dev.yaml
README.md

Docker build command

$ docker build -t crawler .

Docker run command

$ docker run --env-file .env crawler
SFaraji commented 1 year ago

@vanekj any solution did you found? I'm having the same issue and hence opened a new request #90.

SFaraji commented 1 year ago

@B4nan you closed my issue due to being duplicate. Can you then provide a solution to the above issue. Thanks,

B4nan commented 1 year ago

Yes, closed because it is exact duplicate, there is no point in having two issues for the same.

SFaraji commented 1 year ago

Ok, now can you give a solution to the issue?

SFaraji commented 1 year ago

@B4nan anything mate? Are you just going to leave this open ended with no response.

SFaraji commented 1 year ago

Hey, can someone please provide an update regarding this ticket. @B4nan @vanekj

vanekj commented 1 year ago

Hi @SFaraji, I gave up on using the apify Docker image and I am using the Playwright one

# Get the base image of Node version 16
FROM node:16

# Get the latest version of Playwright
FROM mcr.microsoft.com/playwright:focal

# Set the work directory for the application
WORKDIR /app

# COPY the needed files to the app folder in Docker image
COPY package*.json /app/

# Get the needed libraries to run Playwright
RUN apt-get update && apt-get -y install libnss3 libatk-bridge2.0-0 libdrm-dev libxkbcommon-dev libgbm-dev libasound-dev libatspi2.0-0 libxshmfence-dev

# Install the dependencies in Node environment
RUN npm ci

# Start the main script
CMD ["node", "--inspect=0.0.0.0:9229", "src/main.js"]
SFaraji commented 1 year ago

Hi @SFaraji, I gave up on using the apify Docker image and I am using the Playwright one

# Get the base image of Node version 16
FROM node:16

# Get the latest version of Playwright
FROM mcr.microsoft.com/playwright:focal

# Set the work directory for the application
WORKDIR /app

# COPY the needed files to the app folder in Docker image
COPY package*.json /app/

# Get the needed libraries to run Playwright
RUN apt-get update && apt-get -y install libnss3 libatk-bridge2.0-0 libdrm-dev libxkbcommon-dev libgbm-dev libasound-dev libatspi2.0-0 libxshmfence-dev

# Install the dependencies in Node environment
RUN npm ci

# Start the main script
CMD ["node", "--inspect=0.0.0.0:9229", "src/main.js"]

You ran it on AWS Elastic Beanstalk Docker running on 64bit Amazon Linux 2/3.5.3 as well?

vanekj commented 1 year ago

I am running it on my private VPS

SFaraji commented 1 year ago

Thanks @vanekj I really appreciate all fixed. Just wondering is there any other way to get the necessary libraries for Playwright? Because it increased my Docker image size.

vanekj commented 1 year ago

Unfortunately I did not play with it more to strip down the size as I was happy it's working for my needs.

mnmkng commented 1 year ago

AFAIK Apify packages are usually installed with npm install whereas you use npm ci. It might be the cause of your issues. Have you tried the recommended Dockerfiles?

underfisk commented 1 year ago

I'm also experiencing the same issue, looks like it might be broken 😅

mnmkng commented 1 year ago

We're running hundreds of thousands of runs and thousands of builds on those images every day, they're not broken per se. But they might be broken in some specific configurations. Please provide a reproduction scenario or more information. We would like to help, but without more info there's no way how.

underfisk commented 1 year ago

We're running hundreds of thousands of runs and thousands of builds on those images every day, they're not broken per se. But they might be broken in some specific configurations. Please provide a reproduction scenario or more information. We would like to help, but without more info there's no way how.

I did post in another issue #91 where I'm using pnpm and running a Nestjs app with crawlee. Also a little bit more context is that I'm deploy to an ECS

iBubelo commented 1 year ago

I also have exactly the same issue. @vanekj thanks for pointing to Playwright Docker image, it solved the issue for me

kejiweixun commented 1 year ago

Had the same issue.

In my case, it's caused by the mismatch of the playwright version between in package.json and in docker image.

For example, I follow the crawlee doc, specify * for playwright in package.json, and get an image named apify/actor-node-playwright-chrome:16-1.31.2, this will cause the issue. If I replace * with 1.31.2, it will be ok.

In summary, this is my package.json:

{
    "dependencies": {
       "@crawlee/playwright": "^3.3.0",
       "playwright": "1.31.2",
    }
}

This is my base image: FROM apify/actor-node-playwright-chrome:16-1.31.2

mnmkng commented 1 year ago

Thanks for sharing @kejiweixun. That's interesting. Are you using a lock file?

kejiweixun commented 1 year ago

@mnmkng Hi, I try to reproduce this issue, but couldn't. I changed lots of my code including the Dockerfile since I thought I "fixed" this issue, but I don't have a copy of my code when this issue occured.

Now in my code, I change "playwright": "1.31.2" to "playwright": "*", and at the same time, I change FROM apify/actor-node-playwright-chrome:16-1.31.2 to FROM apify/actor-node-playwright-chrome:16, it works with no problem.

B4nan commented 1 year ago

There are new releases of playwright (one landed just a few hours ago), so once we publish a new version of crawlee, the base docker image gets rebuilt and will contain a newer version - and I am afraid that will break it for you again. I feel like the pinning might be actually required to resolve this, as without it, you have two places that need to be synchronized but there is no link between them. NPM with * will technically try to resolve to the latest version, which might be not available in the docker file. If you have a lockfile, you generate that locally - so it will go for the latest available version, but that might not be available in the docker image yet, so fails on building it. And vice versa, if you have a lock file and rebuild an older project, you can get "too new docker image". I think that only with both dockerfile and package.json dependency pinned to exact version can resolve all the possible cases. On the other hand, this requires users to change two places when upgrading the playwright/puppeteer version.

RowanAldean commented 7 months ago

Writing for future reference. I was having issues specifically deploying to Cloud Run using cloudbuild.yaml and gcloud builds submit with expected steps to build, push and deploy a Crawlee project.

Further details of my error are in the mentioned issue. I solved for this by adding RUN npx playwright install to my own Dockerfile thereby not depending on the base image to perform this install and ensuring all default browsers are present in the resulting image.

Good luck future Googler navigating this mess! 🫡

jkorach commented 3 months ago

Another option that worked for me is concating && npx playwright install to the posinstall script in the package.json file image