Open ghost opened 3 years ago
Awesome for providing the options, they put me in the good direction. I am using this website to check which versions we can use to come as close as possible to the one in the Dockerfile: https://pkgs.alpinelinux.org/packages?name=yarn&branch=v3.10.
So I made the following Dockerfile, based on the one in this repo and applied the necessary changes.
FROM alpine:3.10
LABEL maintainer="Robert Riemann <robert.riemann@edps.europa.eu>"
LABEL org.label-schema.description="Website Evidence Collector running in a tiny Alpine Docker container" \
org.label-schema.name="website-evidence-collector" \
org.label-schema.usage="https://github.com/EU-EDPS/website-evidence-collector/blob/master/README.md" \
org.label-schema.vcs-url="https://github.com/EU-EDPS/website-evidence-collector" \
org.label-schema.vendor="European Data Protection Supervisor (EDPS)" \
org.label-schema.license="EUPL-1.2"
# Installs latest Chromium (77) package.
RUN apk add --no-cache --update-cache --repository http://nl.alpinelinux.org/alpine/v3.8/main alsa-lib-dev=1.1.6-r0
RUN apk add \
chromium~=77.0.3865 \
nss \
freetype \
freetype-dev \
harfbuzz \
ca-certificates \
ttf-freefont \
nodejs \
yarn~=1.16 \
# Packages linked to testssl.sh
bash procps drill coreutils libidn curl \
# Toolbox for advanced interactive use of WEC in container
parallel jq grep aha
# Add user so we don't need --no-sandbox and match first linux uid 1000
RUN addgroup --system --gid 1001 collector \
&& adduser --system --uid 1000 --ingroup collector --shell /bin/bash collector \
&& mkdir -p /home/collector/Downloads /output \
&& chown -R collector:collector /home/collector \
&& chown -R collector:collector /output
COPY . /opt/website-evidence-collector/
# Install Testssl.sh
RUN curl -SL https://github.com/drwetter/testssl.sh/archive/3.0.tar.gz | \
tar -xz --directory /opt
# Run everything after as non-privileged user.
USER collector
WORKDIR /home/collector
# Tell Puppeteer to skip installing Chrome. We'll be using the installed package.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
RUN yarn global add file:/opt/website-evidence-collector --prefix /home/collector
# Let Puppeteer use system Chromium
ENV PUPPETEER_EXECUTABLE_PATH /usr/bin/chromium-browser
ENV PATH="/home/collector/bin:/opt/testssl.sh-3.0:${PATH}"
# Let website evidence collector run chrome without sandbox
# ENV WEC_BROWSER_OPTIONS="--no-sandbox"
# Configure default command in Docker container
ENTRYPOINT ["/home/collector/bin/website-evidence-collector"]
WORKDIR /
VOLUME /output
So the changed parts are:
FROM alpine:3.10
....
RUN apk add --no-cache --update-cache --repository http://nl.alpinelinux.org/alpine/v3.8/main alsa-lib-dev=1.1.6-r0
RUN apk add \
chromium~=77.0.3865 \
....
yarn~=1.16 \
Build it:
docker build -t website-evidence-collector .
Please note that in the Dockerfile in the repo, the dot is missing in the comments on how to use the dockerfile
Run it:
mkdir output
chmod 777 output # Can cleaner and securer, but for the sake of the poc
docker run --rm -it --cap-add=SYS_ADMIN -v $(pwd)/output:/output website-evidence-collector https://vincentcox.com --overwrite
If you consider this as a feasible fix, I can make a pull request with all the changes (including the ones on how to use and build it).
Hmmm, I just saw you pushed a hotfix https://github.com/EU-EDPS/website-evidence-collector/commit/c5c4b989a1f51d9e12e81b3afa3f9d4ae7ac4230, let me check this out
So I am using your Dockerfile, but it gets me stuck at this:
Step 11/16 : RUN yarn global add file:/opt/website-evidence-collector --prefix /home/collector
---> Running in 0363b73f8c9a
yarn global v1.22.10
[1/4] Resolving packages...
warning file:/opt/website-evidence-collector > request-promise-native@1.0.9: request-promise-native has been deprecated because it extends the now deprecated request package, see https://github.com/request/request/issues/3142
warning file:/opt/website-evidence-collector > request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142
warning file:/opt/website-evidence-collector > request > har-validator@5.1.5: this library is no longer supported
warning file:/opt/website-evidence-collector > pug > pug-code-gen > constantinople > babel-types > babel-runtime > core-js@2.6.12: core-js@<3 is no longer maintained and not recommended for usage due to the number of issues. Please, upgrade your dependencies to the actual version of core-js@3.
[2/4] Fetching packages...
error An unexpected error occurred: "EACCES: permission denied, scandir '/opt/website-evidence-collector/output/browser-profile'".
info If you think this is a bug, please open a bug report with the information provided in "/home/collector/.config/yarn/global/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/global for documentation about this command.
The command '/bin/sh -c yarn global add file:/opt/website-evidence-collector --prefix /home/collector' returned a non-zero code: 1
Any idea why this is happening?
I could reproduce this problem.
Try to delete the folder /opt/website-evidence-collector/output/browser-profil
. This solved the issue for me. I do not understand why this folder can break the build process.
Ok, it builds now if I add this to the dockerfile:
RUN rm -rf /opt/website-evidence-collector/output/browser-profile
Unfortunately, it's still the same issue as https://github.com/EU-EDPS/website-evidence-collector/issues/42.
Do you have the same issue if you run this?:
docker run --rm -it --cap-add=SYS_ADMIN -v $(pwd)/output:/output website-evidence-collector https://vincentcox.com --overwrite
It takes a lot of time and keeps using more and more ram. It's strange that it also happens with Docker, which should be platform independant. It's not only my website, but sites from a client I am making a dashboard for (unfortunately I can't share it here publicly).
So I'm affraid I'll stick with this one https://github.com/EU-EDPS/website-evidence-collector/issues/43#issuecomment-734236432
Dear all,
in #42, the following problem was described:
The current Dockerfile contains for some dependencies fixed version numbers with the intention to have a rather reproduceable setup:
However, as those versions of chromium and yarn are outdated, they are not distributed anylonger by the Alpine project:
The problem was already brought forward here: https://superuser.com/a/1486407/1039133
Possible options are:
apk add --no-cache --update-cache --repository http://nl.alpinelinux.org/alpine/v3.8/main alsa-lib-dev=1.1.6-r0
\ See https://superuser.com/a/1369979 .