assafelovic / gpt-researcher

GPT based autonomous agent that does online comprehensive research on any given topic
https://gptr.dev
MIT License
13.05k stars 1.61k forks source link

Docker image comes with 8 Critical and 34 High Vulnerabilities #506

Open andora2 opened 1 month ago

andora2 commented 1 month ago

Hi,

please find bellow a less vulnerable docker setup as a improvement suggestion. It reduces theproblem from this [8C, 34H, 32M, 98L Issues]: ..> docker scout quickview image TO this [-C, 1H, 3M, 0L Issues]: image

The main solution is to use alpine instead of debian::bullseye. (bookworm removed the criticals but had still quite some High vuln. issues). Using alpine required to help playwright and pymupdf to pip install successfully, but finaly it worked out.

The app works like a charm.

Though I think the Dockerfile image layer concept might profit from some improvement as well.

Please checkout yourself, and update the dockerfile and requirements.txt for the sake of less vulnerable instances out there :o) Reg. requirements.txt: you just have to exclude playwright and pymupdf since the pip install is done in the docker (not necessary a final requirement, but was good enough for me)

Here the DOCKERFILE:

FROM python:3.11-alpine as install-browser

# Install required packages
RUN apk update && apk add --no-cache \
    chromium \
    chromium-chromedriver \
    firefox-esr \
    nodejs \
    npm \
    wget \
    tar \
    bash \
    build-base \
    libffi-dev \
    gcc \
    g++ \
    make \
    libc-dev \
    linux-headers \
    libxml2-dev \
    libxslt-dev \
    rust \
    cargo \
    openssl-dev \
    jpeg-dev \
    zlib-dev \
    freetype-dev \
    lcms2-dev \
    openjpeg-dev \
    tiff-dev \
    tk-dev \
    tcl-dev \
    harfbuzz-dev \
    fribidi-dev \
    libjpeg-turbo-dev \
    cairo-dev \
    pango-dev \
    giflib-dev \
    poppler-utils \
    poppler-dev \
    tesseract-ocr \
    leptonica-dev \
    musl-dev

# Check versions
RUN chromium-browser --version && chromedriver --version

# Install Geckodriver
RUN wget https://github.com/mozilla/geckodriver/releases/download/v0.33.0/geckodriver-v0.33.0-linux64.tar.gz \
    && tar -xvzf geckodriver-v0.33.0-linux64.tar.gz \
    && chmod +x geckodriver \
    && mv geckodriver /usr/local/bin/

# Set Env. vars, to ignore Root-Warning
ENV PIP_ROOT_USER_ACTION=ignore

# Set environment variables for Playwright
# (spad.uk) https://www.spad.uk/posts/making-playwright-work-on-alpine-out-of-spite/
# running Playwright on Alpine Linux is the compatibility issue with the musl libc library. Playwright and its dependencies are primarily built for the glibc library, which is not available on Alpine Linux.
# https://stackoverflow.com/questions/75581790/how-to-get-playwright-browser-tests-running-on-alpine-docker-container
# One approach to running Playwright on Alpine is to install Node.js and Chromium from the Alpine repositories and configure Playwright to use these installations instead of its own drivers. 
# Using Node.js and Chromium from Alpine Repositories
ENV PLAYWRIGHTBROWSERSPATH=/usr/lib/chromium/
ENV PLAYWRIGHTSKIPBROWSER_DOWNLOAD=1

# Create APP dir
RUN mkdir /usr/src/app
WORKDIR /usr/src/app

# Copy and install Python-Dep.
COPY ./requirements.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Install Playwright without Browser
RUN npm install -g playwright

# Set Playwright Env. Vars
ENV PLAYWRIGHT_BROWSERS_PATH=/usr/bin
ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium-browser
ENV PLAYWRIGHT_FIREFOX_EXECUTABLE_PATH=/usr/local/bin/firefox
ENV PLAYWRIGHT_WEBKIT_EXECUTABLE_PATH=/usr/bin/webkit

# Install PyMuPDF
RUN pip install --no-cache-dir pymupdf

# Change to unprivileged user
RUN adduser -D -s /bin/bash gpt-researcher \
    && chown -R gpt-researcher:gpt-researcher /usr/src/app

USER gpt-researcher

# Copy rest of the code
COPY --chown=gpt-researcher:gpt-researcher ./ ./

# Expose Port 8000
EXPOSE 8000

# Start the APP
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
assafelovic commented 1 month ago

@ElishaKay

ElishaKay commented 1 month ago

@andora2 what machine are you using?

I tried out this Dockerfile on Mac on the Master Branch and it crashed with the error below. Also, is this Dockerfile you propose a lighter or heavier image? Feel free to create a PR with the proposed changes - (seems like you want to add some stuff to requirements.txt as well) and we'll take it from there

3.908 ERROR: Could not find a version that satisfies the requirement playwright (from versions: none)
3.908 ERROR: No matching distribution found for playwright
------
failed to solve: process "/bin/sh -c pip install --no-cache-dir -r requirements.txt" did not complete successfully: exit code: 1
andora2 commented 1 month ago

Hi,

machine is: windows 10 The error I see is because playwright is still in the requirements.txt, and that has to fail. Alpine forces us to deal with playwright and pymupdf separately => in the Dockerfile itself. (I did mention that in my suggestion) So no, I didn't have to add anything to the requirements.txt rather comment out playwright and pymupdf (please check the docker delta and my suggestion again, it is mentioned there).

I would have PR this, but it needs some cleancode beautifying steps and unfortunately I'll not make it any time soon (if at all). I had to solve this issue for a dedicated topic but nothing more then that.I thought I could at least let you know.

Take care, Adrian