Closed micuentadecasa closed 1 month ago
I found the cause of the error. OpenShift uses a non root user to run the docker/pod, so we need to modify the dockerfile to give permissions to the user group that is used to access the folders.
I will post it on Monday.
El sáb, 2 nov 2024, 15:03, scheckley @.***> escribió:
I found the cause of the error. OpenShift uses a non root user to run the docker/pod, so we need to modify the dockerfile to give permissions to the user group that is used to access the folders.
if you make any progress on a rootless container deployment it would be interesting to see the Dockerfile. I'm working on the same problem here.
— Reply to this email directly, view it on GitHub https://github.com/Cinnamon/kotaemon/issues/418#issuecomment-2453000142, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG6XKXTT5AHQ5FXIBFCUS3Z6TLT3AVCNFSM6AAAAABQKIRWNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJTGAYDAMJUGI . You are receiving this because you modified the open/close state.Message ID: @.***>
thanks :)
I ended up with this, which may not not be the neatest, but seems to work:
FROM python:3.10-slim as base_image
# Use Python virtual environment for dependencies to avoid system-wide installs
ENV VENV_PATH=/app/venv
RUN python3 -m venv $VENV_PATH
# Set up PATH for the virtual environment
ENV PATH="$VENV_PATH/bin:$PATH"
# Common dependencies with non-root considerations
RUN apt-get update -qqy && \
apt-get install -y --no-install-recommends \
ssh \
git \
gcc \
g++ \
poppler-utils \
libpoppler-dev \
unzip \
curl \
cargo \
tesseract-ocr \
tesseract-ocr-jpn \
libsm6 \
libxext6 \
libreoffice \
ffmpeg \
libmagic-dev
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV PYTHONIOENCODING=UTF-8
ENV TARGETARCH=${TARGETARCH}
# Create working directory with correct permissions
WORKDIR /app
RUN chmod -R 755 /app
# Set up NLTK data directory for cache
ENV NLTK_DATA=/app/nltk_data
RUN mkdir -p /app/nltk_data && chmod -R 775 /app/nltk_data
FROM base_image as dev
# Download pdfjs
COPY scripts/download_pdfjs.sh /app/scripts/download_pdfjs.sh
RUN chmod +x /app/scripts/download_pdfjs.sh
ENV PDFJS_PREBUILT_DIR="/app/libs/ktem/ktem/assets/prebuilt/pdfjs-dist"
RUN bash scripts/download_pdfjs.sh $PDFJS_PREBUILT_DIR
# Copy contents
COPY . /app
COPY .env.example /app/.env
RUN pip install --upgrade pip
# Install pip packages
RUN pip install --no-cache-dir wheel && \
pip install --no-cache-dir -e "libs/kotaemon" && \
pip install --no-cache-dir graphrag nano-graphrag future python-decouple theflow==0.8.6 && \
pip install --no-cache-dir -e "libs/ktem" && \
pip install --no-cache-dir "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"
# Install torch and additional packages
RUN pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu && \
pip install --no-cache-dir -e "libs/kotaemon[adv]" && \
pip install --no-cache-dir unstructured[all-docs]
# Download NLTK packages explicitly
RUN pip install --no-cache-dir nltk && \
python -c "import nltk; nltk.download('punkt', download_dir=nltk.data.path[0]); nltk.download('averaged_perceptron_tagger', download_dir=nltk.data.path[0])"
# Verify theflow installation
RUN pip freeze | grep theflow && \
python -c "import theflow; print(theflow.__file__)"
RUN chmod -R 775 /app
RUN pip uninstall -y hnswlib
RUN pip uninstall -y chroma-hnswlib
RUN pip install --no-cache-dir chroma-hnswlib
# Expose the apps default port
EXPOSE 7860
# Let OpenShift automatically assign a random user
USER 1001
CMD ["python", "app.py", "--host", "0.0.0.0", "--port", "7860"]
I had a problem with hnswlib which I think stems from having multiple versions installed during the build.
app.py I edited to point to 0.0.0.0:
import os
from theflow.settings import settings as flowsettings
KH_APP_DATA_DIR = getattr(flowsettings, "KH_APP_DATA_DIR", ".")
GRADIO_TEMP_DIR = os.getenv("GRADIO_TEMP_DIR", None)
# override GRADIO_TEMP_DIR if it's not set
if GRADIO_TEMP_DIR is None:
GRADIO_TEMP_DIR = os.path.join(KH_APP_DATA_DIR, "gradio_tmp")
os.environ["GRADIO_TEMP_DIR"] = GRADIO_TEMP_DIR
from ktem.main import App # noqa
app = App()
demo = app.make()
demo.queue().launch(
favicon_path=app._favicon,
inbrowser=True,
allowed_paths=[
"libs/ktem/ktem/assets",
GRADIO_TEMP_DIR,
],
server_name="0.0.0.0",
)
This has deployed on an on-premise OpenShift cluster without any admin privileges.
this is mine
FROM python:3.10-slim AS lite
RUN apt-get update -qqy && \ apt-get install -y --no-install-recommends \ ssh \ git \ gcc \ g++ \ poppler-utils \ libpoppler-dev \ unzip \ curl \ cargo
ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONUNBUFFERED=1 ENV PYTHONIOENCODING=UTF-8
WORKDIR /app
RUN mkdir -p /app/libs && \ mkdir -p /app/scripts && \ chmod -R g+rwX /app && \ chown -R 1001:0 /app
COPY scripts/download_pdfjs.sh /app/scripts/download_pdfjs.sh RUN chmod +x /app/scripts/download_pdfjs.sh ENV PDFJS_PREBUILT_DIR="/app/libs/ktem/ktem/assets/prebuilt/pdfjs-dist"
COPY . /app
RUN chmod -R g+rwX /app && chown -R 1001:0 /app
RUN --mount=type=ssh \ --mount=type=cache,target=/root/.cache/pip \ pip install -e "libs/kotaemon" \ && pip install -e "libs/ktem" \ && pip install graphrag future \ && pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"
RUN apt-get autoremove \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* \ && rm -rf ~/.cache
RUN chmod -R g+rwX /usr/local/lib/python3.10/site-packages/
CMD ["python", "app.py"]
FROM lite AS full
RUN apt-get update -qqy && \ apt-get install -y --no-install-recommends \ tesseract-ocr \ tesseract-ocr-jpn \ libsm6 \ libxext6 \ libreoffice \ ffmpeg \ libmagic-dev
RUN --mount=type=ssh \ --mount=type=cache,target=/root/.cache/pip \ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
COPY . /app
RUN mkdir -p /app/nltk_data && chmod -R g+rwX /app/nltk_data ENV NLTK_DATA=/app/nltk_data
RUN mkdir -p /app/matplotlib && chmod -R g+rwX /app/matplotlib ENV MPLCONFIGDIR=/app/matplotlib
RUN mkdir -p /app/fontconfig && chmod -R g+rwX /app/fontconfig ENV XDG_CACHE_HOME=/app/fontconfig
RUN --mount=type=ssh \ --mount=type=cache,target=/root/.cache/pip \ pip install -e "libs/kotaemon[adv]" \ && pip install unstructured[all-docs]
RUN apt-get autoremove \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* \ && rm -rf ~/.cache
RUN python -c "from unstructured.nlp.tokenize import _download_nltk_packages_if_not_present; _download_nltk_packages_if_not_present()"
CMD ["python", "app.py"]
did you happen to mount persistent storage and settings between builds? i tried to mount a pvc at ~/app/ktem_app_data/ using a symbolic link, but it doesn't seem to persist if the container is rebuilt.
Description
I have tried to install the docker image in OpenShift, but it gives an error when building the image.
Reproduction steps
Screenshots
Logs
Browsers
No response
OS
No response
Additional information
No response