apify / apify-actor-docker

Base Docker images for Apify actors.
https://hub.docker.com/u/apify
Apache License 2.0
70 stars 22 forks source link

Base image based on Alpine #99

Open dragospopa420 opened 1 year ago

dragospopa420 commented 1 year ago

Which package is the feature request for? If unsure which one to select, leave blank

@crawlee/core

Feature

The base image is based on Debian which has a much bigger fingerprint than the Alpine Linux. So I was thinking maybe the included dockerfile can be based on Alpine Linux, for fast deployment and testing The apify/actor-node-puppeteer-chrome has 2.53gb, my version has 698mb

Motivation

I'm building an infrastructure of spiders based on Crawlee and I wanted to have the fastest possible deployment time.

Ideal solution or implementation, and any additional constraints

FROM node:current-alpine

# Set workdir
WORKDIR /usr/src/app

# Copy just package.json and package-lock.json
# to speed up the build using Docker layer cache.
COPY package*.json ./

# Change rights for package-lock.json
RUN chmod 744 package-lock.json

# Install chromium and it's dependencies, node is also here to be sure that is updated
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      harfbuzz \
      ca-certificates \
      ttf-freefont \
      nodejs \
      yarn

# This tells puppeteer to not download chrome again
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
    && npm install --omit=dev --omit=optional \
    && echo "Installed NPM packages:" \
    && (npm list --omit=dev --all || true) \
    && echo "Node.js version:" \
    && node --version \
    && echo "NPM version:" \
    && npm --version

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./

# Required for Crawlee
ENV CRAWLEE_CHROME_EXECUTABLE_PATH=/usr/bin/chromium-browser
RUN chmod 744 /usr/bin/chromium-browser

# Run the image.
CMD npm start 

Alternative solutions or implementations

No response

Other context

No response

ivanvs commented 1 year ago

Hi @dragospopa420,

I think that docker images are not part of this repo. You should probably raise an issue on this repo since if I understand everything correctly that is repository for docker images for apify.

Source code of the image that you are referencing is here: https://github.com/apify/apify-actor-docker/tree/master/node-puppeteer-chrome

dragospopa420 commented 1 year ago

Thanks @ivanvs . Thanks @B4nan for transferring the issue Thanks @mtrunkat

I've also had some time to test this image and seems to perform well. Haven't found anything wrong with it.

mtrunkat commented 1 year ago

Thanks, @dragospopa420. The image size is currently something we plan to look into.

CC @fnesveda @B4nan, please take a look

B4nan commented 1 year ago

I was asking @vladfrangu to take a closer look last week. IIRC the reason why we use ubuntu was supporting chromium, rest of the browsers should be fine with debian?

dragospopa420 commented 1 year ago

I was asking @vladfrangu to take a closer look last week. IIRC the reason why we use ubuntu was supporting chromium, rest of the browsers should be fine with debian?

This image is using Alpine. Chromium works fine on Alpine. Deployed it in some production environments already. From what I see Firefox is in the community repo of Alpine and it works properly.

vladfrangu commented 1 year ago

Super sorry for the late response! The main image that (probably) can't use Alpine is WebKit (Safari). Its good to know that chromium works on alpine, but does chrome work on it too? 👀

fnesveda commented 1 year ago

I believe the main reason we use Debian in the base images is compatibility with user libraries. Debian uses glibc, while Alpine uses an alternative libc implementation, musl libc, which is not 100% compatible. While musl libc behaves more correctly according to standards, most software is written targeting glibc and all its quirks, and could break when used with musl libc (or would have to be recompiled at least). So I would recommend staying with Debian for these compatibility reasons.

I believe most of the size difference between the image produced by @dragospopa420's Dockerfile and what we have is down to other differences:

You can use the great dive tool to inspect the images layer by layer and see what's taking up most of the size.